Z-Score
One of the most common measures in statistics is the z-score. What it measures is how many standard deviations away from the mean a data point is.
Equation
$$z = \frac{x-\overline{x}}{\sigma}$$
First, let’s check out how far each point in our y data set is away from the mean.
xVals = list(range(100))
yVals = [x+randint(-20,20) for x in xVals]
xVals = np.array(xVals)
yVals = np.array(yVals)
print(yVals-yVals.mean())
Now, if we divide everything by the standard deviation…
print((yVals-yVals.mean())/yVals.std())
Now I want to introduce the correlation equation, there will be two versions, a standard one and a one using z-scores.
Equation
$$r_{xy} = \frac{\Sigma (x-\overline{x})(y-\overline{y})}{\sqrt{\Sigma (x-\overline{x})^2 \Sigma (y-\overline{y})^2}}$$
Challenge
Find the equation which uses z-scores