The Data

We are going to be using a library called gmplot, but we can’t import it right away. We need to first install it. The way we do that is through the line of code below.

! pip install gmplot

Going back to the data, we are going to import the data with a library called pandas. Pandas allows us to manage huge data sets. We downloaded the data in the form of csv, so the way we will bring the datasets into python is through the following: pd.read_csv(). We are also going to specify which column is the index with the argument: index_col =’PID’. When you run the below code, you will get a warning since you do not set the dtype. For this to run faster, you could specify each variable’s type, but given there are 50 variables, it would be tedious.

import pandas as pd
Assess_2015 = pd.read_csv("property-assessment-fy2015.csv", index_col ='PID')
Assess_2016 = pd.read_csv("property-assessment-fy2016.csv", index_col='PID')

To view the dataframe, all we need to do is print it.

print(Assess_2015)

What if we wanted to view only one column? You would index it by adding [name] where name is the string for the column you want. So to view the street names, you would do this:

print(Assess_2015['ST_NAME'])

Now if we wanted multiple columns, we would need to index with an array. You would do it as such: [array]. If we wanted the name and numbers for streets, we would do this:

print(Assess_2015[['ST_NAME','ST_NUM']])

What we are going to do is find the land value, the building value, and the total value in 2015, then merge it into our 2016 data to see how the value has grown over time.

Challenge

The columns AV_TOTAL, AV_BLDG, and AV_LAND correspond to these values. Print out these three columns.

Data Science

Mapping Boston Real Estate

The Data

Challenge

Leave A Reply Cancel reply

Modal title