## The p-value

All statistical tests produce a p-value and this is equal to the probability of obtaining the observed difference, or one more extreme, if the null hypothesis is true. To put it another way - if the null hypothesis is true, the p-value is the probability of obtaining a difference at least as large as that observed due to sampling variation.

Consequently, if the p-value is small the data support the alternative hypothesis. If the p-value is large the data support the null hypothesis. But how small is 'small' and how large is 'large'?!

Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as sufficiently small to reject the null hypothesis. If the p-value is larger than 0.05 we fail to reject the null hypothesis.

The 5% value is called the significance level of the test. Other significance levels that are commonly used are 1% and 0.1%. Some people use the following terminology:

p-value |
Outcome of test |
Statement |
---|---|---|

greater than 0.05 | Fail to reject H_{0} |
No evidence to reject H_{0} |

between 0.01 and 0.05 | Reject H_{0} (Accept H_{1}) |
Some evidence to reject H_{0} (therefore accept H _{1}) |

between 0.001 and 0.01 | Reject H_{0} (Accept H_{1}) |
Strong evidence to reject H_{0} (therefore accept H _{1}) |

less than 0.001 | Reject H_{0} (Accept H_{1}) |
Very strong evidence to reject H_{0} (therefore accept H_{1}) |

**Further Reading**

Since the p-value does not relate to the importance of a finding (it depends on sample size), often appropriate confidence intervals are given. See Campbell & Machin (1999) for more details.