-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Column Collisions
Column Collisions¶
The last thing to cover with regards to joins is collisions in columns. Often times you may find that you are joining two dataframes that have similar columns but they are named the same. For our example, imagine there are two dataframes that have sales as the column, but one actually is stores in a physical store and one is a store online. We can use a suffix to handle the matching column names. Let’s start with an example.
In [16]:
#Create the data
instore_sales = pd.DataFrame([[100],[200], [300]], index=[1, 2, 3], columns=['Sales'])
print(instore_sales)
print()
online_sales = pd.DataFrame([[150],[100], [200]], index=[1, 2, 3], columns=['Sales'])
print(online_sales)
Sales
1 100
2 200
3 300
Sales
1 150
2 100
3 200
Notice what happens when you try to join these.
In [17]:
#Join the sales data
sales = instore_sales.join(online_sales)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-8c6685b0d6a4> in <module>
1 #Join the sales data
----> 2 sales = instore_sales.join(online_sales)
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
7206 5 K5 A5 NaN
7207 """
-> 7208 return self._join_compat(
7209 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
7210 )
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
7222
7223 if isinstance(other, DataFrame):
-> 7224 return merge(
7225 self,
7226 other,
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
86 validate=validate,
87 )
---> 88 return op.get_result()
89
90
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in get_result(self)
646 lsuf, rsuf = self.suffixes
647
--> 648 llabels, rlabels = _items_overlap_with_suffix(
649 ldata.items, lsuf, rdata.items, rsuf
650 )
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _items_overlap_with_suffix(left, lsuffix, right, rsuffix)
2022
2023 if not lsuffix and not rsuffix:
-> 2024 raise ValueError(
2025 "columns overlap but no suffix specified: "
2026 "{rename}".format(rename=to_rename)
ValueError: columns overlap but no suffix specified: Index(['Sales'], dtype='object')
The way to work around this is to use lsuffix and rsuffix which represent what suffix to add to each column in the left and right.
In [18]:
#Join after using a suffix
sales = instore_sales.join(online_sales, lsuffix=' Instore', rsuffix=' Online')
print(sales)
Sales Instore Sales Online
1 100 150
2 200 100
3 300 200
Prev
Multi-index Joins
Next
Introduction