Mean and Standard Deviation Part 2

Solution

xbar = sum(scores)/len(scores)
print(xbar)
sigma = (sum([(x-xbar)**2 for x in scores])/len(scores))**.5
print(sigma)
s = (sum([(x-xbar)**2 for x in scores])/(len(scores)-1))**.5
print(s)

You will notice that numpy returns the population standard deviation, this will be the one we use for these lessons.

We might want to also know the median, the middle number, which is easy to find.

print(np.median(scores))

The interquartile range (the 25th percentile and the 75th percentile) can also be pretty easily found.

print(np.percentile(scores,25))
print(np.percentile(scores,75))

How do transformations affect the mean and standard deviation? What if we added 100 to every single data point (say we had a 100 point curve for example).

scores2 = scores+100
print("Mean...")
print(scores.mean(),scores2.mean())
print("Standard Deviation...")
print(scores.std(),scores2.std())
print("")
print(np.percentile(scores,0),np.percentile(scores2,0))
print(np.percentile(scores,25),np.percentile(scores2,25))
print(np.percentile(scores,50),np.percentile(scores2,50))
print(np.percentile(scores,75),np.percentile(scores2,75))
print(np.percentile(scores,100),np.percentile(scores2,100))

When we add numbers to the data, everything gets shifted by that number including mean, however the standard deviation does not change. This is because while the numbers are greater, their distance away from the mean is unchanged.

What if we multiply everything by a number?

scores2 = scores*2
print("Mean...")
print(scores.mean(),scores2.mean())
print("Standard Deviation...")
print(scores.std(),scores2.std())
print("")
print(np.percentile(scores,0),np.percentile(scores2,0))
print(np.percentile(scores,25),np.percentile(scores2,25))
print(np.percentile(scores,50),np.percentile(scores2,50))
print(np.percentile(scores,75),np.percentile(scores2,75))
print(np.percentile(scores,100),np.percentile(scores2,100))

Everything is now doubled, including standard deviation because the spread has changed.

We are going to introduce a terminology now, expected value. The expected value is what we expect on average, so in the case of a distribution the expected value equals the average of the distribution.

Equation

$$E(x) = \overline{x}$$

Now, the expected value of a constant is itself so….

Equation

$$E(c) = c$$

What if we add, multiply or do both to a number?

Equations

$$E(cx) = c*E(x) = c\overline{x}$$
$$E(x+c) = E(x)+c = \overline{x}+c$$
$$E(cx+d) = c*E(x)+d = c\overline{x}+d$$

Now I will explain first how variances are affected, then the standard deviations by proxy.

Equations

$$Var(cx) = c^2*Var(x)$$
$$Var(x+c) = Var(x)$$
$$Var(cx+d) = c^2*Var(x)$$

Equations

$$SD(cx) = c*SD(x)$$
$$SD(x+c) = SD(x)$$
$$SD(cx+d) = c*SD(x)$$

Adding a constant does not matter for variance, but multiplying a constant does.

Basics

Statistics

Mean and Standard Deviation Part 2

Solution

Equation

Equation

Equations

Equations

Equations

Source Code

Leave A Reply Cancel reply

Modal title