-
Graphing Data 4
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
-
Mean and Standard Deviation 5
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
-
Distributions 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Correlation and Linear Regression 7
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
-
Probability 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
-
Counting Principles 3
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
-
Binomial Distribution 3
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
-
Confidence Interval 7
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Lecture8.4
-
Lecture8.5
-
Lecture8.6
-
Lecture8.7
-
-
Proportion Confidence Interval 3
-
Lecture9.1
-
Lecture9.2
-
Lecture9.3
-
-
Hypothesis Testing 5
-
Lecture10.1
-
Lecture10.2
-
Lecture10.3
-
Lecture10.4
-
Lecture10.5
-
-
Comparing Two Means 5
-
Lecture11.1
-
Lecture11.2
-
Lecture11.3
-
Lecture11.4
-
Lecture11.5
-
-
Chi-squared Test 3
-
Lecture12.1
-
Lecture12.2
-
Lecture12.3
-
Unknown Standard Deviations
Solution
import scipy.stats
t = ((100-105)-(0))/(5**2/50+10**2/75)**.5
print(t)
print(scipy.stats.norm.cdf(t)*2)
print(scipy.stats.norm.cdf(t))
If we instead don’t know the standard deviations, the equation becomes….
Equation
When we compute this t value, the degrees of freedom equals min(N
1
-1,N
2
-1). We only need to do this with low sample sizes, otherwise the t score converges to the z score.
Let’s do an example…
dist1 = scipy.stats.norm(300,100)
dist2 = scipy.stats.norm(400,100)
pts1 = dist1.rvs(25,random_state=1)
pts2 = dist2.rvs(26,random_state=2)
So now we have two random samples, let’s get our t score.
import numpy as np
sd1 = np.std(pts1)
sd2 = np.std(pts2)
mean1 = np.mean(pts1)
mean2 = np.mean(pts2)
t = ((mean1-mean2)-(0))/(sd1**2/25+sd2**2/26)**.5
print(t)
print(scipy.stats.t.cdf(t,25-1)*2)
The chance that the two distributions are the same is small! We would expect this though, we set up two distributions that have a difference in means of 100! What if we had a hunch that our samples should have a difference of 100? This would meanH
0
: μ
1
+100 = μ
2
.
import numpy as np
sd1 = np.std(pts1)
sd2 = np.std(pts2)
mean1 = np.mean(pts1)
mean2 = np.mean(pts2)
t = ((mean1-mean2)-(-40))/(sd1**2/25+sd2**2/26)**.5
print(t)
print(scipy.stats.t.cdf(t,25-1)*2)
Now, we do not have a significant value!