Multi-index Joins
Multi-Index Joins¶
Imagine a hypothetical scenario where we have 2 products, measured over 2 days. We have prices and sales once again, but what if the price had changed between the two days? In this case, maybe the business owner charges more on a given day. How can we join these 4 records together? The answer lies in the multi-index. First, create the data below.
In [13]:
#Create the data
cost = pd.DataFrame([["Product 1", 100, 1],
["Product 1", 110, 2],
["Product 2", 55, 1],
["Product 2", 60, 2],], columns=['Product', 'Cost', 'Day'])
print(cost)
sales = pd.DataFrame([["Product 1", 1, 10],
["Product 1", 2, 10],
["Product 2", 1, 15],
["Product 2", 2, 20]], columns=['Product', 'Day', 'Volume'])
print(sales)
To set a multi-index, we can pass in a list of indices to use for our data. Let's set both to have a multi-index.
In [14]:
#Set the multi-index
cost = cost.set_index(['Product', 'Day'])
print(cost)
print()
sales = sales.set_index(['Product', 'Day'])
print(sales)
Now if we just do a simple left join on this, we are going to have no additional rows created, because each of the 4 rows is a 1 to 1 match as the index is unique.
In [15]:
#Join the data
df3 = cost.join(sales)
print(df3)