Inferential Statistics

From sample to population'

A set of measurements can almost always be regarded as measurements on a sample of items from a population of these items, as it is usually impractical or impossible to measure every item in the population. Thus we have to make inferences about the population from the sample.

Click the start button for a demonstration:

This can only be true if the sample is representative of the population, and even then the sample is very unlikely to reflect the population exactly in all respects. That is, there is uncertainty as to how well the sample results reflect the population. Statistical methods have been developed to reduce and quantify this uncertainty.

For example, an investigation into the performance of a new drug designed to alleviate the symptoms of a particular disease would involve taking a group of people suffering from this disease and inferring that the results of this trial would apply to all those suffering, and those who suffer in the future, from the disease.

Two aspects of statistical inference are estimation and hypothesis testing – using statistical tests

 

Estimation

To obtain an accurate estimate of a population parameter, the sample must be representative of the population. To avoid bias the sample items should be selected from the population at random. This means that all members of the population have an equal chance of being in the sample.

The precision of the estimate depends on the size of the sample. Clearly the larger the sample the better the estimate will be. Precision is measured by calculating the standard error of the estimate or a confidence interval (usually the 95% confidence interval).

Worked example
Consider the following times (to the nearest hour) that 16 patients experience relief from a migraine after taking a certain drug:

7
8
1
2
6
3
5
2
4
9
4
6
5
6
9
8

mean = 5.312 hours
standard deviation = 2.522 hours
Thus the estimate of the mean time for a patient to experience relief is 5.312 hours
The 95% confidence interval for the mean time to experience relief is calculated to be 3.975 to 6.649.

It can be said that there is a probability of 0.95 that the population mean lies between 3.975 hours and 6.649 hours. This provides a clear idea of how precisely the population mean has been estimated by these data.

The precision of estimates should always be reported alongside the estimate, and sometimes authors quote the standard error rather than a confidence interval.

Although 95% confidence intervals are most often reported, you will sometimes see 99% confidence intervals, in which case the confidence interval contains the population parameter with probability 0.99 and will, consequently, be wider than the corresponding 95% confidence interval. To calculate a 99% confidence interval, the factor 2 is replaced by 2.6.

Calculation of a 95% confidence interval for the mean
The 95% confidence interval for a mean is calculated (approximately) from:

Sample mean – 2 x (Standard error of the mean) to Sample mean + 2 x (Standard error of the mean),

where the standard error of the mean = sample standard deviation divided by the square root of the sample size.

The factor 2 varies according to the sample size but only varies from 2.201 to 1.960 for sample sizes greater than 10, so that 2 is an adequate approximation in most cases. If you use a computer package that calculates the confidence interval the exact factor will be used.

Note: Since the sample mean and standard deviation are estimates of fixed (albeit unknown) quantities the only way of affecting the confidence interval is by altering the sample size, n. Increasing n will reduce the standard error of the mean and thus the width of the interval. But notice that to halve the width of the interval we have to quadruple the sample size (because of the square root in the formula).

 

Computer Output

Confidence intervals for the mean in Minitab
To obtain a confidence interval for a set of data in Minitab, click on Stat > Basic Statistics > 1-Sample t…
The data from the above example, entered in column C1, gives the following output

T Confidence Intervals

Variable
N
Mean
StDev SE
Mean
95.0 % CI
C1
16
5.312
2.522
0.631
( 3.968, 6.657)

Changing the confidence interval level to 99.0 gives the following output

T Confidence Intervals

Variable
N
Mean
StDev SE
Mean
99.0 % CI
C1
16
5.312
2.522
0.631
( 3.968, 7.171)

Confidence intervals for the mean in SPSS
To obtain a confidence interval. The output also includes a single sample t-test,click on Analyze > Compare Means > One-Sample T Test…

Confidence intervals for the mean in Excel
Not easy and best to be avoided! But if you have to, the following is an example of how to calculate an approximate lower 95% confidence limit of a set of data in cells A1 to A16 using Excel’s CONFIDENCE function.

=AVERAGE(A1:A16)-CONFIDENCE(1-0.95,STDEV(A1:A16),COUNT(A1:A16))

the upper limit is

=AVERAGE(A1:A16)+CONFIDENCE(1-0.95,STDEV(A1:A16),COUNT(A1:A16))

Statistical Tests

Find the relevant test in the diagram below and click for a fuller description: