Column Collisions

Column Collisions¶

The last thing to cover with regards to joins is collisions in columns. Often times you may find that you are joining two dataframes that have similar columns but they are named the same. For our example, imagine there are two dataframes that have sales as the column, but one actually is stores in a physical store and one is a store online. We can use a suffix to handle the matching column names. Let’s start with an example.

In [16]:

#Create the data
instore_sales = pd.DataFrame([[100],[200], [300]], index=[1, 2, 3], columns=['Sales'])
print(instore_sales)
print()

online_sales = pd.DataFrame([[150],[100], [200]], index=[1, 2, 3], columns=['Sales'])
print(online_sales)

Notice what happens when you try to join these.

In [17]:

#Join the sales data
sales = instore_sales.join(online_sales)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-8c6685b0d6a4> in <module>
      1 #Join the sales data
----> 2 sales = instore_sales.join(online_sales)

~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   7206         5  K5  A5  NaN
   7207         """
-> 7208         return self._join_compat(
   7209             other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
   7210         )

~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   7222
   7223         if isinstance(other, DataFrame):
-> 7224             return merge(
   7225                 self,
   7226                 other,

~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     86         validate=validate,
     87     )
---> 88     return op.get_result()
     89
     90

~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in get_result(self)
    646         lsuf, rsuf = self.suffixes
    647
--> 648         llabels, rlabels = _items_overlap_with_suffix(
    649             ldata.items, lsuf, rdata.items, rsuf
    650         )

~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _items_overlap_with_suffix(left, lsuffix, right, rsuffix)
   2022
   2023     if not lsuffix and not rsuffix:
-> 2024         raise ValueError(
   2025             "columns overlap but no suffix specified: "
   2026             "{rename}".format(rename=to_rename)

ValueError: columns overlap but no suffix specified: Index(['Sales'], dtype='object')

The way to work around this is to use lsuffix and rsuffix which represent what suffix to add to each column in the left and right.

In [18]:

#Join after using a suffix
sales = instore_sales.join(online_sales, lsuffix=' Instore', rsuffix=' Online')
print(sales)

   Sales Instore  Sales Online
1            100           150
2            200           100
3            300           200

Data Science

Data Science

Column Collisions

Column Collisions¶

Leave A Reply Cancel reply

Modal title