## Chi-squared test for nominal (categorical) data

The c2 test can be used to determine whether a difference between 2 categorical variables in a sample is likely to reflect a real difference between these 2 variables in the population.

Note: in the case of 2 variables being compared, the test can also be interpreted as determining if there is an association (or relationship) between the two variables.

The sample data is used to calculate a single number (or test statistic), the size of which reflects the probability (p-value) that the observed difference between the 2 variables has occurred by chance, ie due to sampling error.
Worked example
The maternity wards of two hospitals had different preparation for childbirth schemes. A study of mothers who had participated in the schemes asked them to assess their satisfaction with the scheme with the following results:

(Observed counts) Hospital
A
B
Total
Very satisfied
38
72
110
Satisfied
33
57
90
Neutral
42
38
80
Dissatisfied a little
26
44
70
Dissatisfied a lot
11
29
40
Total
200
240
440

To answer the question 'is there any evidence of a difference in the satisfaction of the mothers between the two schemes at the two hospitals?', the chi-square test is used.

Suitable null and alternative hypotheses might be:

• H0: There is no difference in satisfaction of the mothers between the two schemes, and
• H1: There is a difference in satisfaction of the mothers between the two schemes.

To perform a chi-squared test, the number of mothers expected in each cell of the table if the null hypothesis is true, is calculated.

Calculations

The following calculations are for demonstration and, hopefully, to aid understanding– a computer package will do the appropiate calculations.
The expected numbers (under the null hypothesis) in each cell are equal to Thus for the very satisfied/hospital A cell the expected number is To calculate the chi-squared (?2) statistic the value of needs to be calculated for each cell in the table. For the very satisfied/hospital A cell this is The chi-square statistic is calculated to be total of these values

(Expected counts) Hospital
A
B
Total
Very satisfied
50.0
60.0
110
Satisfied
63.6
76.4
90
Neutral
36.4
43.6
80
Dissatisfied a little
31.8
38.2
70
Dissatisfied a lot
18.2
21.8
40
Total
200
240
440

From these expected and the observed values the chi-squared test-statistic is computed, and the resulting p-value is examined.

Computer Output

Chi-squared test in Minitab

Data should be entered in 2 columns, then select
Stat > Tables > Cross Tabulation… > Chi-Square Test

Alternatively, if the values in the contingency table have already been calculated, select
Stat>Tables>Chi-Square Test

Chi-Square Test: red, yellow, green, blue (1 refers to Introverts, 2 refers to Extroverts)

Note: Interpret 0.000 as p < 0.001

Chi-squared test in SPSS

Data should be entered in 2 columns, then select
Analyze >Descriptive Statistics>Crosstabs
SPSS can only be used for raw data

Some choices need to be made from the Statistics and Cells buttons in the dialogue box, to get the chi-squared test results, and to get the expected frequencies, as shown in the output below. Initially, only the 'Pearson Chi-Square' line needs to be investigated.  Note: The p-value is printed as .000
This should be interpreted as p< 0.001, and not be taken as exactly 0

Results

The chi-squared test statistic is 24.84 with an associated p < 0.001.

Note: .000 should not be interpreted as exactly zero, as in the computer print-out.

The null hypothesis is rejected, since p < 0.001, and a conclusion is made that there is a difference in satisfaction of the mothers between the two schemes. Examining the pattern of numbers it is noted that more mothers were satisfied with the scheme at hospital A than with the scheme at hospital B.

A chart illustrates the pattern of responses well.

Bar chart to compare satisfaction responses from mothers in hospitals A and B Note: If more than one of the expected frequencies is less than 5 (in small tables), or if more than 20% are less than 5 in large tables, cells should be pooled to reduced the number of expected frequencies that are less than 5.

Note: Yates correction and Fisher's exact tests for 2x2 contingency tables are also used.