-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Multi-index Joins
Multi-Index Joins¶
Imagine a hypothetical scenario where we have 2 products, measured over 2 days. We have prices and sales once again, but what if the price had changed between the two days? In this case, maybe the business owner charges more on a given day. How can we join these 4 records together? The answer lies in the multi-index. First, create the data below.
In [13]:
#Create the data
cost = pd.DataFrame([["Product 1", 100, 1],
["Product 1", 110, 2],
["Product 2", 55, 1],
["Product 2", 60, 2],], columns=['Product', 'Cost', 'Day'])
print(cost)
sales = pd.DataFrame([["Product 1", 1, 10],
["Product 1", 2, 10],
["Product 2", 1, 15],
["Product 2", 2, 20]], columns=['Product', 'Day', 'Volume'])
print(sales)
Product Cost Day
0 Product 1 100 1
1 Product 1 110 2
2 Product 2 55 1
3 Product 2 60 2
Product Day Volume
0 Product 1 1 10
1 Product 1 2 10
2 Product 2 1 15
3 Product 2 2 20
To set a multi-index, we can pass in a list of indices to use for our data. Let's set both to have a multi-index.
In [14]:
#Set the multi-index
cost = cost.set_index(['Product', 'Day'])
print(cost)
print()
sales = sales.set_index(['Product', 'Day'])
print(sales)
Cost
Product Day
Product 1 1 100
2 110
Product 2 1 55
2 60
Volume
Product Day
Product 1 1 10
2 10
Product 2 1 15
2 20
Now if we just do a simple left join on this, we are going to have no additional rows created, because each of the 4 rows is a 1 to 1 match as the index is unique.
In [15]:
#Join the data
df3 = cost.join(sales)
print(df3)
Cost Volume
Product Day
Product 1 1 100 10
2 110 10
Product 2 1 55 15
2 60 20
Next
Column Collisions