-
Graphing Data 4
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
-
Mean and Standard Deviation 5
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
-
Distributions 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Correlation and Linear Regression 7
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
-
Probability 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
-
Counting Principles 3
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
-
Binomial Distribution 3
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
-
Confidence Interval 7
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Lecture8.4
-
Lecture8.5
-
Lecture8.6
-
Lecture8.7
-
-
Proportion Confidence Interval 3
-
Lecture9.1
-
Lecture9.2
-
Lecture9.3
-
-
Hypothesis Testing 5
-
Lecture10.1
-
Lecture10.2
-
Lecture10.3
-
Lecture10.4
-
Lecture10.5
-
-
Comparing Two Means 5
-
Lecture11.1
-
Lecture11.2
-
Lecture11.3
-
Lecture11.4
-
Lecture11.5
-
-
Chi-squared Test 3
-
Lecture12.1
-
Lecture12.2
-
Lecture12.3
-
Mean and Standard Deviation Part 2
Solution
xbar = sum(scores)/len(scores)
print(xbar)
sigma = (sum([(x-xbar)**2 for x in scores])/len(scores))**.5
print(sigma)
s = (sum([(x-xbar)**2 for x in scores])/(len(scores)-1))**.5
print(s)
You will notice that numpy returns the population standard deviation, this will be the one we use for these lessons.
We might want to also know the median, the middle number, which is easy to find.
print(np.median(scores))
The interquartile range (the 25th percentile and the 75th percentile) can also be pretty easily found.
print(np.percentile(scores,25))
print(np.percentile(scores,75))
How do transformations affect the mean and standard deviation? What if we added 100 to every single data point (say we had a 100 point curve for example).
scores2 = scores+100
print("Mean...")
print(scores.mean(),scores2.mean())
print("Standard Deviation...")
print(scores.std(),scores2.std())
print("")
print(np.percentile(scores,0),np.percentile(scores2,0))
print(np.percentile(scores,25),np.percentile(scores2,25))
print(np.percentile(scores,50),np.percentile(scores2,50))
print(np.percentile(scores,75),np.percentile(scores2,75))
print(np.percentile(scores,100),np.percentile(scores2,100))
When we add numbers to the data, everything gets shifted by that number including mean, however the standard deviation does not change. This is because while the numbers are greater, their distance away from the mean is unchanged.
What if we multiply everything by a number?
scores2 = scores*2
print("Mean...")
print(scores.mean(),scores2.mean())
print("Standard Deviation...")
print(scores.std(),scores2.std())
print("")
print(np.percentile(scores,0),np.percentile(scores2,0))
print(np.percentile(scores,25),np.percentile(scores2,25))
print(np.percentile(scores,50),np.percentile(scores2,50))
print(np.percentile(scores,75),np.percentile(scores2,75))
print(np.percentile(scores,100),np.percentile(scores2,100))
Everything is now doubled, including standard deviation because the spread has changed.
We are going to introduce a terminology now, expected value. The expected value is what we expect on average, so in the case of a distribution the expected value equals the average of the distribution.
Equation
Now, the expected value of a constant is itself so….
Equation
What if we add, multiply or do both to a number?
Equations
$$E(x+c) = E(x)+c = \overline{x}+c$$
$$E(cx+d) = c*E(x)+d = c\overline{x}+d$$
Now I will explain first how variances are affected, then the standard deviations by proxy.
Equations
$$Var(x+c) = Var(x)$$
$$Var(cx+d) = c^2*Var(x)$$
Equations
$$SD(x+c) = SD(x)$$
$$SD(cx+d) = c*SD(x)$$
Adding a constant does not matter for variance, but multiplying a constant does.
Source Code