Independent Samples t-test

The t-test is used to compare the values of the means from two samples and test whether it is likely that the samples are from populations having different mean values.

When two samples are taken from the same population it is very unlikely that the means of the two samples will be identical. When two samples are taken from two populations with very different means values, it is likely that the means of the two samples will differ. Our problem is how to differentiate between these two situations using only the data from the two samples.

Worked example

A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who each underwent arm exercise tests. Nine of the men were randomly selected to take a capsule containing pure caffeine one hour before the test. The other men received a placebo capsule. During each exercise the subject's respiratory exchange ratio (RER) was measured. (RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is being obtained from carbohydrates or fats).

The question of interest to the experimenter was whether, on average, caffeine changes RER.

The two populations being compared are “men who have not taken caffeine” and “men who have taken caffeine”. If caffeine has no effect on RER the two sets of data can be regarded as having come from the same population.

 

The results were as follows:

  RER(%)
  Placebo Caffeine
  105 96
  119 99
  100 94
  97 89
  96 96
  101 93
  94 88
  95 105
  98 88
Mean
100.56 94.22
SD
7.70 5.61

The means show that, on average, caffeine appears to have altered RER from about 100.6% to 94.2%, a change of 6.4%. However, there is a great deal of variation between the data values in both samples and considerable overlap between them. So is the difference between the two means simply due sampling variation, or does the data provide evidence that caffeine does, on average, reduce RER? The p-value obtained from an independent samples t-test answers this question.

The t-test tests the null hypothesis that the mean of the caffeine treatment equals the mean of the placebo versus the alternative hypothesis that the mean of caffeine treatment is not equal to the mean of the placebo treatment.

Computer output obtained for the RER data gives the sample means and the 95% confidence interval for the difference between the means.

Computer output

The Independent Samples t-test in Minitab
Enter the data from both samples into one column and the group identity in a second column, then select
Stat > Basic Statistics > 2-Sample t...
to perform an independent sample t-test in Minitab

Two Sample T-Test and Confidence Interval

Two sample T for Caffeine vs Placebo

  N Mean StDev SE Mean
Caffeine
9 94.22 5.61 1.9
Placebo
9 100.56 7.70 2.6

95% CI for mu Caffeine - mu Placebo: (-13.1, 0.4)
T-Test mu Caffeine = mu Placebo (not =): T = -1.99 P = 0.032 DF = 16
Both use Pooled StDev = 6.74

N.B. mu = m = mean

The Independent Samples t-test in SPSS
Enter the data from both samples into one column and the group identity in a second column, then select
Analyze > Compare Means > Independent Samples T Test ...

T-Test
table showing t-test group stats

table showing independent samples test

Note: The difference in signs obtained in the two outputs is because one calculation considers caffeine – placebo values, and the other placebo – caffeine. It makes no difference to the conclusions of the test, ie p = 0.063.

Results
The p-value is 0.063 and, therefore, the difference between the two means is not statistically significantly different from zero at the 5% level of significance. There is an estimated change of 6.4% (SE = 3.17%). However, there is insufficient evidence (p = 0.063) to suggest that caffeine does change the mean RER.

Alternative suggestion
It could be argued, however, that the researcher might only be interested in whether 'caffeine reduces RER'. That is, the researcher is looking for a specific direction for the difference between the two population means. This is an example of a one-tail t-test as opposed to a two-tailed t-test outlined above.

It is possible to make the choice for a one-tail test in Minitab.
SPSS only performs a 2-tailed test (the non-directional alternative hypothesis) and to obtain the p-value for the directional alternative hypothesis (one-tailed test) the p-value should be halved. Hence, in this example, p = 0.032.

A suitable null hypothesis in both cases is H0: On average, caffeine has no effect on RER,

with an alternative (or experimental) hypothesis,
H1: On average, caffeine changes RER (2-tail test), or

H1: On average, caffeine reduces RER (1-tail case).

Results for the alternative suggestion could be reported as something along the lines:

The mean RER in the caffeine group (94.2 ± 1.9) was significantly lower (t = 1.99, 16 df, one-tailed t-test, p = 0.032) than the mean of the placebo group (100.6 ± 2.6).

The number after a mean value and the ± sign is the standard error of the mean.

Note: It is important to decide whether a one- or two-tailed test is being carried-out, before analysis takes place.
Otherwise it might be tempting to see what the p-value is before making your decision!

Assumptions underlying the independent sample t-test
Both the paired and independent sample t-tests make assumptions about the data, although both tests are fairly robust against departures from these assumptions.

For the independent samples t-test it is assumed that both samples come from normally distributed populations with equal standard deviations (or variances) - although some statistical packages (e.g. Minitab and SPSS) allow you to relax the assumption of equal population variances and perform a t-test that does not rely on this assumption. Statistical tests are available to assess whether the two sample variances are significantly different, but a simple rule-of-thumb is to check whether one standard deviation is more than twice the size of the other. If it is, use the 'unequal variances' option.

If normality cannot be assumed, the Mann-Whitney Test is often used, but is less powerful than the t-test.