-
wbdata 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Hexbin Plots 7
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
Lecture2.7
-
-
Heatmap 5
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
-
Boxplot 2
-
Lecture4.1
-
Lecture4.2
-
-
Violin Plot 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Time Series 2
-
Lecture6.1
-
Lecture6.2
-
-
Pairplot 2
-
Lecture7.1
-
Lecture7.2
-
-
Kernel Density Estimation 3
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Basics
First, let’s see what a basic plot would look like. Let’s get 10,000 random points between 0 and 1, and then get a second set of points which is just equal to those points times 2.
import numpy as np
rs = np.random.RandomState(1)
pts1 = rs.uniform(size=10000)
pts2 = pts1*2
We set the random state so that we get the same results. rs.uniform() gives us back n random points which we can set with size=n.
Let’s plot our first hexbin!
import seaborn as sns
import matplotlib.pyplot as plt
sns.jointplot(pts1,pts2,kind="hex")
plt.show()
As you can see, both of the histograms show an even distribution across the range which we expect. Within in the center, we see that there is a perfect correlation where as x goes up so does y by twice as much (the pearson r represents correlation and is 1 in this case, as it is printed). What if we had two uniform distributions that were independent?
rs = np.random.RandomState(1)
pts1 = rs.uniform(size=10000)
pts2 = rs.uniform(size=10000)
sns.jointplot(pts1,pts2,kind="hex")
plt.show()
Now that they are independent, there is no longer a relationship.
Let’s try this with two independent normal distributions.
rs = np.random.RandomState(1)
pts1 = rs.normal(size=10000)
pts2 = rs.normal(size=10000)
sns.jointplot(pts1,pts2,kind="hex")
plt.show()
In this case, there is not a relationship again, but we see that the points tend to group towards the center because both are normally distributed. You can think about it in terms of standard deviations, being three standard deviations from the mean to the right has only a probability .15%, the chance that both the first and second distribution have a value three standard deviations to the right is .15%*.15% which equals .0225%. On the other hand, the chance both points are within one standard deviation of their means is 68% * 68% = 46%.