Merging Data Part 2

The function groupby() let’s us apply functions to specific groups. The argument you give it is whatever column you want to group, and it makes groups based on which rows have matching values in that column. For example, let’s say we want to know the average value of houses based on how many rooms they have. We could do this, where we group by number of rooms, and then get the function mean() for the average.

print(Assess_2016.groupby("R_TOTAL_RMS").mean())

Now, here is how we can get only unique PIDs. We will group by PID, and then apply the function first() to get only the first row for each PID. There is one caveat, since our PID column is our index column we need to access it by using the index attribute. Assess_2016.index will return the index column which is our PIDs.

Assess_2016 = Assess_2016.groupby(Assess_2016.index).first()
Assess_2015 = Assess_2015.groupby(Assess_2015.index).first()

Now, we can finally merge our dataset.

Assess_Combined = pd.concat([Assess_2016, Assess_2015], axis=1)
print(Assess_Combined)

The final step in this lesson is to save our dataset as a csv.

Assess_Combined.to_csv("Assess_Combined.csv",encoding="UTF-8")

The first argument is what we want to save the csv as. The encoding argument is for what type of character encoding we want to use.

Data Science

Mapping Boston Real Estate

Merging Data Part 2

Source Code

Leave A Reply Cancel reply

Modal title