Chi-squared test for nominal (categorical) data

The c2 test is used to determine whether an association (or relationship) between 2 categorical variables in a sample is likely to reflect a real association between these 2 variables in the population.

 

Note: In the case of 2 variables being compared, the test can also be interpreted as determining if there is a difference between the two variables.

The sample data is used to calculate a single number (or test statistic), the size of which reflects the probability (p-value) that the observed association between the 2 variables has occurred by chance, ie due to sampling error.

Worked example
A group of students were classified in terms of personality (introvert or extrovert) and in terms of colour preference (red, yellow, green or blue) with the purpose of seeing whether there is an association (relationship) between personality and colour preference. Data was collected from 400 students and presented in the 2 (rows) x 4 (cols) contingency table below:

(Observed counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert personality
20
6
30
44
100
Extrovert personality
180
34
50
36
300
Totals
200
40
80
80
400

Suitable null and alternative hypotheses might be:

  • H0: Colour preference is not associated with personality, and
  • H1: Colour preference is associated with personality

To perform a chi-squared test, the number of students expected in each cell of the table if the null hypothesis is true, is calculated.

Calculated

The following calculations are for demonstration and, hopefully, to aid understanding– a computer package will do the appropriate calculations.

The expected numbers (under the null hypothesis) in each cell are equal to

row total mutiplied by column total divided by grand total

Thus for the introvert/red cell the expected number is

100 multiplied by 200 divided by 400 equals 50

To calculate the chi-squared (c2) statistic the value of

(observed frequency minus expected frequency) squared, divided by expected frequency

needs to be calculated for each cell in the table. For the introvert/red cell this is

(20 - 50)squared divided by 50 = 18.00

The chi-square statistic is calculated to be total of these values

 (Expected counts)
Colours
Red
Yellow
Green
Blue
Totals
Introvert personality
  50
10
20
20
100
Extrovert personality
150
30
60
60
300
Totals
200
40
80
80
400

From these expected and the observed values the chi-squared test-statistic is computed, and the resulting p-value is examined.

Computer Output

Chi-squared test in Minitab

Data should be entered in 2 columns, then select
Stat > Tables > Cross Tabulation… > Chi-Square Test

Alternatively, if the values in the contingency table have already been calculated, select
Stat>Tables>Chi-Square Test

Chi-Square Test: red, yellow, green, blue

table showing chi square test results

(1 refers to Introverts, 2 refers to Extroverts)

Note: Interpret 0.000 as p < 0.001

Chi-squared test in SPSS

Data should be entered in 2 columns, then select
Analyze >Descriptive Statistics>Crosstabs
SPSS can only be used for raw data

Some choices need to be made from the Statistics and Cells buttons in the dialogue box, to get the chi-squared test results, and to get the expected frequencies, as shown in the output below. Initially, only the 'Pearson Chi-Square' line needs to be investigated.

table showing personality type and favourite colour crosstabulation

table of chi-square test results

Note: The p-value is printed as .000
This should be interpreted as p< 0.001, and not be taken as exactly 0

Results

The chi-squared test statistic is 71.20 with an associated p < 0.001.

Note: .000 should not be interpreted as exactly zero, as in the computer print-out.

The null hypothesis is rejected, since p < 0.001, and a conclusion is made that colour preference is associated with personality. Examining the pattern of numbers it is noted that more introverts prefer blue than expected and less preferred red. The extroverts tend to favour red more than blue.

A chart illustrates the pattern of responses well.

Bar chart to illustrate the relationship between personality type and colour preference

Bar chart to illustrate the relationship between personality type and colour preference

Note: If more than one of the expected frequencies is less than 5 (in small tables), or if more than 20% are less than 5 in large tables, cells should be pooled to reduced the number of expected frequencies that are less than 5.

Note: Yates correction and Fisher's exact tests for 2x2 contingency tables are also used.

hello