-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 4
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
-
Location Groups 3
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
-
Creating the New Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Mapping the Data 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Merging Data
Solution
merge_key=['AV_TOTAL','AV_BLDG','AV_LAND']
print(Assess_2015[merge_key])
We don’t need any of the other data from 2015, so we are going to set Assess_2015 equal to what we just printed out
Assess_2015 = Assess_2015[merge_key]
We can set the column names of a dataframe by changing the attribute columns. We just need to feed it an array of strings for each of the columns. Let’s change the columns so that we can differentiate between the 2016 and 2015 values.
Assess_2015.columns = ["AV_TOTAL_PRE","AV_BLDG_PRE","AV_LAND_PRE"]
print(Assess_2015)
You’ll notice the columns have changed.
And now time to combine the two datasets! Run the code below, and it will return a dataframe of the two dataframes put together. Just kidding, you’ll get an error on the code below but I wanted to include it so you could see what you might need to do when working with datasets.
Assess_Combined = pd.concat([Assess_2016, Assess_2015], axis=1)
First let me explain how putting two dataframes together works. The way we do it is through pd.concat(). You need to give this function an array of dataframes you want to put together, and we also specify axis=1, which means don’t put the dataframes together on top of one and other, but instead side by side.
Now, the reason it did not work is because we have duplicate PIDs. When we concat, the dataframes get put together by their indexes. If we have multiple copies of the same index, then pandas does not know which to add together.
What we will do to deal with this is get rid of any duplicate PIDs in the data.