The p-value

All statistical tests produce a p-value and this is equal to the probability of obtaining the observed difference, or one more extreme, if the null hypothesis is true. To put it another way - if the null hypothesis is true, the p-value is the probability of obtaining a difference at least as large as that observed due to sampling variation.

Consequently, if the p-value is small the data support the alternative hypothesis. If the p-value is large the data support the null hypothesis. But how small is 'small' and how large is 'large'?!

Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as sufficiently small to reject the null hypothesis. If the p-value is larger than 0.05 we fail to reject the null hypothesis.

The 5% value is called the significance level of the test. Other significance levels that are commonly used are 1% and 0.1%. Some people use the following terminology:

 

p-value Outcome of test Statement
greater than 0.05 Fail to reject H0 No evidence to reject H0
between 0.01 and 0.05 Reject H0 (Accept H1) Some evidence to reject H0
(therefore accept H1)
between 0.001 and 0.01 Reject H0 (Accept H1) Strong evidence to reject H0
(therefore accept H1)
less than 0.001 Reject H0 (Accept H1) Very strong evidence to reject H0 (therefore accept H1)

Further Reading
Since the p-value does not relate to the importance of a finding (it depends on sample size), often appropriate confidence intervals are given. See Campbell & Machin (1999) for more details.