-
Graphing Data 4
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
-
Mean and Standard Deviation 5
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
-
Distributions 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Correlation and Linear Regression 7
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
-
Probability 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
-
Counting Principles 3
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
-
Binomial Distribution 3
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
-
Confidence Interval 7
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Lecture8.4
-
Lecture8.5
-
Lecture8.6
-
Lecture8.7
-
-
Proportion Confidence Interval 3
-
Lecture9.1
-
Lecture9.2
-
Lecture9.3
-
-
Hypothesis Testing 5
-
Lecture10.1
-
Lecture10.2
-
Lecture10.3
-
Lecture10.4
-
Lecture10.5
-
-
Comparing Two Means 5
-
Lecture11.1
-
Lecture11.2
-
Lecture11.3
-
Lecture11.4
-
Lecture11.5
-
-
Chi-squared Test 3
-
Lecture12.1
-
Lecture12.2
-
Lecture12.3
-
PDF and CDF Part 2
Solution
dist2 = scipy.stats.norm(600,100)
xVals = list(range(1000))
yVals1 = [dist.pdf(x) for x in xVals]
yVals2 = [dist2.pdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()
dist2 = scipy.stats.norm(600,100)
xVals = list(range(1000))
yVals1 = [dist.cdf(x) for x in xVals]
yVals2 = [dist2.cdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Cummulative Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()
What if instead of a shifted mean, we had a standard deviation that was half of the original.
dist2 = scipy.stats.norm(400,50)
xVals = list(range(1000))
yVals1 = [dist.pdf(x) for x in xVals]
yVals2 = [dist2.pdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()
dist2 = scipy.stats.norm(400,50)
xVals = list(range(1000))
yVals1 = [dist.cdf(x) for x in xVals]
yVals2 = [dist2.cdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Cummulative Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()
Moving on, a helpful rule is the 68-95-99.7 rule, it says that for a normal curve 68% of numbers will be within one standard deviation from the mean, 95% will be within two standard deviations and 99.7% will be within 3 standard deviations.
The z score is a measure that represents how many standard deviations away from the mean a point is.
Equation
The rvs function of the distribution allows us to sample random points, let’s do that now, and also get the mean and standard deviation to check it.
import numpy as np
sample = dist.rvs(size=10000)
print(np.mean(sample))
print(np.std(sample))
Let’s get a histogram of the data. You’ll notice because it is a sample it is not a perfect representation.
plt.hist(sample,bins=100)
plt.show()
We can get an array of true and falses by applying logic to the entire array, this will be very important soon.
print(sample)
print(sample>470)
Notice it is true for any values above 470, false for any below. We can also index by this, when we index it will return on the values that are true.
print(sample[sample>470])
Let’s convert our sample into z-scores.
zScores = (sample-np.mean(sample))/np.std(sample)
print(zScores)
Challenge