Pandas DataFrames
Pandas DataFrames¶
A pandas dataframe is a structure which holds 2 dimensional data including an index and columns. You can think of it is similar to a spreadsheet. To create one, you need to pass in a list of either tuples, list of lists or (we won’t cover it here) a dictionary. For our first dataframe, let’s get the following data. Each entry of the list below is going to be 4 attributes, a name, a height, a weight and a string for offense or defense. Right now these are all strings.
#Now, let's create some data, each element corresponds to a row
#Each tuple corresponds to the data for the row
data = []
data.append(("Ray Lewis","6'1","250","Defense"))
data.append(("Tom Brady","6'4","225","Offense"))
data.append(("Julio Jones","6'3","220","Offense"))
data.append(("Richard Sherman","6'3","194","Defense"))
print(data)
#We can initialize a dataframe with a list of data
df = pd.DataFrame(data)
print(df)
You can pass in a list of labels for either the columns or index if you want. For our example from before we can add columns.
#The columns argument lets us set the names for the columns in our dataframe
df = pd.DataFrame(data,columns=["Name","Height","Weight","Type"])
print(df)
Findinding column values can be done in the same way that you might grab values from a dictionary. You pass in brackets with the column you want to grab. For example, the following will get you the names.
#To get a specific column, we index with the column name.
print(df["Name"])
#Likewise we can get height
print(df["Height"])
If we give a nested list where the first element is a list of names, we can select multiple columns. This will give us back a dataframe which is just with the columns we are looking for.
#Grab the name and weight columns
print(df[["Name","Weight"]])
By indexing into a column but then passing values we can either overwrite or create a new column. Below we are able to set the retired column.
#We can also assign a new column by the above method if we give a list of equal length as the dataframes rows
df["Retired"] = [True,False,False,False]
print(df)