Column Collisions
Column Collisions¶
The last thing to cover with regards to joins is collisions in columns. Often times you may find that you are joining two dataframes that have similar columns but they are named the same. For our example, imagine there are two dataframes that have sales as the column, but one actually is stores in a physical store and one is a store online. We can use a suffix to handle the matching column names. Let’s start with an example.
In [16]:
#Create the data
instore_sales = pd.DataFrame([[100],[200], [300]], index=[1, 2, 3], columns=['Sales'])
print(instore_sales)
print()
online_sales = pd.DataFrame([[150],[100], [200]], index=[1, 2, 3], columns=['Sales'])
print(online_sales)
Notice what happens when you try to join these.
In [17]:
#Join the sales data
sales = instore_sales.join(online_sales)
The way to work around this is to use lsuffix and rsuffix which represent what suffix to add to each column in the left and right.
In [18]:
#Join after using a suffix
sales = instore_sales.join(online_sales, lsuffix=' Instore', rsuffix=' Online')
print(sales)