Introduction
Joins¶
Joins are the way that we can join together two dataframes based on common keys. If you have never used joins before it may be worthwhile to use a more in depth explanation resource as this resource will just give a quick review for each.
The options for joins are:
Left Join¶
The left join takes the left dataframe’s index, and then matches any records on the right dataframe and merges the two dataframes. Any left index values that do not have a matching right index will be set to null for the new columns.
Right Join¶
The right join takes the right dataframe’s index, and then matches any records on the left dataframe and merges the two dataframes. Any right index values that do not have a matching left index will be set to null for the new columns.
Outer¶
The outer join takes the combination of the two indices and matches then merges the dataframes. If either a left index or a right index does not have a matching index then it will be set to null in the respective new columns.
Inner¶
The inner join takes the intersection of the two indices for matching and merging. There will not be missing values in this case because it is the intersection.
The default behavior is to join based on whatever the index is in the two dataframes. As well, we call a join like so: df1.join(df2) where df1 is the left and df2 is the right dataframe. The default will be a left join but the how argument allows for changing the type of the join. Now step 1 is going to be creating our dummy data. There is going to be two columns in each which are different from each other, as well as one indice in each that matches and one indice in each that is unique to that dataframe.
import pandas as pd
#Create the data
df1 = pd.DataFrame([[95, 100],
[77, 34]], index=['A', 'C'],
columns=['Col 1', 'Col 2'])
df2 = pd.DataFrame([[10,20],
[20,30]], index=['A', 'B'], columns=['Col 3', 'Col 4'])
print(df1)
print()
print(df2)