More Plots
Let’s see what happens when we have two distributions that are correlated but not perfectly correlated. We will create a random normal distribution, and then create a second one by taking the first and adding on top of it another random distribution.
rs = np.random.RandomState(1)
pts1 = rs.normal(size=10000)
pts2 = pts1+rs.normal(size=10000)
sns.jointplot(pts1,pts2,kind="hex")
plt.show()
Now we see that there is a positive correlation, but it isn’t perfect.
Now we will see why you need to visualize data before doing things like running a regression. Let’s create our data.
rs = np.random.RandomState(1)
pts1 = rs.normal(size=10000)
pts2 = abs(pts1)
So we have our first distribution which is a normal distribution, and then a second one which is the absolute value of the first. There must be a correlation right? Actually, the classic correlation measure reports that there is no correlation, check it out.
sns.jointplot(pts1,pts2,kind="hex")
plt.show()
This is the reason why we have to use data visualization, if we had not we would have that thought there was no correlation, but really there was if we just saw that their was a transformation that must have been applied.
Let’s import our libraries and set up our indicators.
import wbdata
import pandas as pd
indicators = {}
indicators["NY.GDP.MKTP.KD.ZG"] = "GDP Rate"
indicators["SL.UEM.TOTL.ZS"] = "Unemployment"
If we use wbdata.get_country(display=False), we can get all the countries, we use display=False so that they get them as a return item as opposed to something printed. Let’s iterate through them.
for i in wbdata.get_country(display=False):
print(i)
What we get back is dictionaries for each country. One of the keys, “ID”, gives us the ID representing each country. Let’s iterate through just those.
for i in wbdata.get_country(display=False):
print(i["id"])
Challenge