Basic Charts Part 2
When we look at distributions, we will often talk about the skew.
Skew means to which side more of the data lies, skewed right is when more of the data is towards the left and the tail is longer on the right. Skewed left is the opposite. An example of skewed left….
answers = np.array([0, 0, 1, 1, 1, 1,2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,4,4,4,4, 4])
vals = list(range(0,5))
counts = np.bincount(answers)
plt.bar(vals,counts)
plt.show()
Another common measure we use in statistics is the mode, or the most common occurence. You might be wondering how to find it; numpy has a function argmax() which returns the indice of the maximum value in an array. Because our count values represent the count at each indice, it will end up returning the most common number….
answers = np.array([0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4])
vals = list(range(0,5))
counts = np.bincount(answers)
plt.bar(vals,counts)
plt.show()
print("The mode is " + str(np.argmax(counts)))
Something important to note: bincount() will not work with negative values.
I am going to also introduce another library: collections. From collections, we can import counter which will help us. We can create our counter object, and from that we also have a function most_common().
from collections import Counter
data = Counter(answers)
print(data.most_common(1))
As you can see, we get a return value of (2,9), this means that the most common value is 2 and the number of times it appears is 9. If we change the argument from 1 to 2, we can get the two most common values.
print(data.most_common(2))