-
wbdata 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Hexbin Plots 7
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
Lecture2.7
-
-
Heatmap 5
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
Lecture3.5
-
-
Boxplot 2
-
Lecture4.1
-
Lecture4.2
-
-
Violin Plot 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Time Series 2
-
Lecture6.1
-
Lecture6.2
-
-
Pairplot 2
-
Lecture7.1
-
Lecture7.2
-
-
Kernel Density Estimation 3
-
Lecture8.1
-
Lecture8.2
-
Lecture8.3
-
Grouping Data
Solution
import seaborn as sns
sns.jointplot(df["FEDFUNDS"],df["CPIAUCSL"],kind="hex")
plt.show()
What we are going to do now is create ten groups for each data series representing the ten deciles. The reason I want to do this is so that we can create a heatmap where we see the interactions between deciles.
We can use rank() on our dataframe to get the rankings for each data series; if we use rank(pct=True) we get the percentiles.
df.rank(pct=True)
Let’s create a function which converts a percentile to a group between 1-10.
def decile(x):
x = int(x*10)+1
if x==11:
x=10
return x
By multiplying by 10 and adding 1, we convert anything between 0-.09999 to 1, anything between .1-.199999 to 2, and so on. The max will equal 1, which converts to 11 in this formula so we convert 11 to 10 at the end to correct this.
If we want to apply a function to every cell instead of columns or rows, we can use applymap(), as so:
rankings = df.rank(pct=True).applymap(decile)
print(rankings)
Pandas dataframes have a very interesting function value_counts() which returns the count of each unique element in the data series.
rankings["FEDFUNDS"].value_counts()
Let’s do a sanity check to make sure the groups are the same length for both. We can visualize with plot(kind=”bar”)
rankings["FEDFUNDS"].value_counts().plot(kind="bar")
plt.show()
rankings["CPIAUCSL"].value_counts().plot(kind="bar")
plt.show()
Something which is common in excel is a pivot table, you can also create one in pandas. The idea is you set a row, a column and what the values for each row/column intersection should be. Run this code.
rankings.pivot_table(index='FEDFUNDS', columns='CPIAUCSL', aggfunc=len)
Index is what we want running along the left, columns is what we want running across the top, and aggfunc is what we want the values inside the cells to be. We use len because we want to see how many records are in each intersection.
Let’s set a variable pivot equal to this.
pivot = rankings.pivot_table(index='FEDFUNDS', columns='CPIAUCSL', aggfunc=len)