-
Graphing Data 4
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
-
Mean and Standard Deviation 5
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
-
Distributions 6
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
Lecture3.6
-
-
Correlation and Linear Regression 7
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
Lecture4.4
-
Lecture4.5
-
Lecture4.6
-
Lecture4.7
-
-
Probability 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
-
Counting Principles 3
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
-
Binomial Distribution 3
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
-
Confidence Interval 7
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Lecture8.4
-
Lecture8.5
-
Lecture8.6
-
Lecture8.7
-
-
Proportion Confidence Interval 3
-
Lecture9.1
-
Lecture9.2
-
Lecture9.3
-
-
Hypothesis Testing 5
-
Lecture10.1
-
Lecture10.2
-
Lecture10.3
-
Lecture10.4
-
Lecture10.5
-
-
Comparing Two Means 5
-
Lecture11.1
-
Lecture11.2
-
Lecture11.3
-
Lecture11.4
-
Lecture11.5
-
-
Chi-squared Test 3
-
Lecture12.1
-
Lecture12.2
-
Lecture12.3
-
Basic Charts
We are going to leverage numpy for a lot of the work in these lessons; it’s extremely fast and full of functionality. Let’s first establish an array that we will say is survey answers for ratings of this course, ranging from 0 to 4.
import numpy as np
answers = np.array([0, 2, 1, 0, 4, 2, 3, 1, 4, 1, 4, 2, 3])
The first numpy function we are going to use is bincount(), what it returns is a count for each non-negative integer in the range of our array.
print(np.bincount(answers))
What if we wanted to plot a histogram of this data? What we could do is create an x-axis range from 0 to the largest integer in the problem set as well as set our y-axis equal to the bin counts we just found.
vals = list(range(0,5))
counts = np.bincount(answers)
import matplotlib.pyplot as plt
plt.bar(vals,counts)
plt.show()
This dataset can be described as symmetric, the left and right sides are about the same size. It can also possibly be called a uniform distribution since the bars are all about the same size.
What if instead, we had this distribution….
answers = np.array([0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4])
vals = list(range(0,5))
counts = np.bincount(answers)
plt.bar(vals,counts)
plt.show()
This curve is bell shaped, what we call this in statistics is a normal distribution.