Swarmplot

Let’s reset the index so that we can get country and year as columns.

df.reset_index(inplace=True)

For our swarmplot, we don’t need the date so let’s get rid of it.

del(df["date"])

Next, we are going to use pd.melt(). This function is useful when we have multiple columns, and we want to convert them to two columns: a variable column and a value column. What we do is give pd.melt() a dataframe, a column to use as the identifier, and finally a name for our variable column. We will use country as our identifier because we want to observe country differences, and we can call our variable column indicator because we have indicator values.

df = pd.melt(df, "country", var_name="indicator")
print(df)

Now, the way we use a swarmplot is we give an values for the x-axis (we are going to use the four indicators as our x-axis), a y-axis value (our value column) and a hue to separate our data by color (we give country).

import seaborn as sns
import matplotlib.pyplot as plt
sns.swarmplot(x="indicator", y="value", hue="country", data=df)
plt.show()

Because of the large size of GDP values, we cannot see the rest of the values. To fix this, we need to scale the data. There are many ways to scale the data, but we will use a simple 0-1 scale where 0 represents the minimum, 1 represents the max. The equation for scaling is:

Equation

$$ \text{Scaled X} = \frac{x_{i}-min(x)}{max(x)-min(x)}$$

Challenge

Pull the data again and apply scaling to each column. Hint: The library sklearn has preprocessing.MinMaxScaler() which can be used for scaling.

Basics

Seaborn Data Visualization

Swarmplot

Equation

Challenge

Leave A Reply Cancel reply

Modal title