Basics

We use a chi-squared test when we have data presented in a tabular form like below. Let’s say we have a table representing whether or not an athlete was injured in a season and what sports they play.

	Football	Basketball	Soccer
Injured	50	40	20
Not Injured	30	25	15

The first step to the chi-squared test is getting the column and row totals, let’s do it first by hand.

	Football	Basketball	Soccer	Total
Injured	50	40	20	110
Not Injured	30	25	15	70
Total	80	65	35	180

Now, let’s look at some code. Let’s represent our table as a numpy array.

import numpy as np
table = np.array([[50,40,20],[30,25,15]])
table

Numpy has built in functions to get the sum, or sum across rows/columns. If we want to sum over the columns we use axis=0, if we want rows we use axis=1, and if we want just the general sum then we don’t give any arguments.

print(table.sum(axis=0))
print(table.sum(axis=1))
print(table.sum())

Now, recall that if two variables are independent, then P(A and B) = P(A)*P(B). What we are going to do now is predict each square based off this hypothesis. So, for example, when we talk about the top left square, we want to first find the probability that an athlete plays football, then the probability an athlete is injured and finally we can figure out the expected number of athletes in this category (if they are independent).

So if we have a sample size n, for each square there is a row and a column it belongs to. We want to find E=P(A)*P(B)*n which is equal to (Row Total/n)*(Column Total/n)*n.

The way we can get the expected probabilities is…

for index, x in np.ndenumerate(table):
    print(index, x)

Let’s compute the expected values. The way we can get the row, column and value for each square is with np.ndenumerate().

for index, x in np.ndenumerate(table):
    print(index, x)

Challenge

Find the expected values.

Basics

Statistics

Basics

Challenge

Leave A Reply Cancel reply

Modal title