Pandas DataFrames

Pandas DataFrames¶

A pandas dataframe is a structure which holds 2 dimensional data including an index and columns. You can think of it is similar to a spreadsheet. To create one, you need to pass in a list of either tuples, list of lists or (we won’t cover it here) a dictionary. For our first dataframe, let’s get the following data. Each entry of the list below is going to be 4 attributes, a name, a height, a weight and a string for offense or defense. Right now these are all strings.

In [9]:

#Now, let's create some data, each element corresponds to a row
#Each tuple corresponds to the data for the row
data = []
data.append(("Ray Lewis","6'1","250","Defense"))
data.append(("Tom Brady","6'4","225","Offense"))
data.append(("Julio Jones","6'3","220","Offense"))
data.append(("Richard Sherman","6'3","194","Defense"))
print(data)

[('Ray Lewis', "6'1", '250', 'Defense'), ('Tom Brady', "6'4", '225', 'Offense'), ('Julio Jones', "6'3", '220', 'Offense'), ('Richard Sherman', "6'3", '194', 'Defense')]

In [10]:

#We can initialize a dataframe with a list of data
df = pd.DataFrame(data)
print(df)

                 0    1    2        3
0        Ray Lewis  6'1  250  Defense
1        Tom Brady  6'4  225  Offense
2      Julio Jones  6'3  220  Offense
3  Richard Sherman  6'3  194  Defense

You can pass in a list of labels for either the columns or index if you want. For our example from before we can add columns.

In [11]:

#The columns argument lets us set the names for the columns in our dataframe
df = pd.DataFrame(data,columns=["Name","Height","Weight","Type"])
print(df)

              Name Height Weight     Type
0        Ray Lewis    6'1    250  Defense
1        Tom Brady    6'4    225  Offense
2      Julio Jones    6'3    220  Offense
3  Richard Sherman    6'3    194  Defense

Findinding column values can be done in the same way that you might grab values from a dictionary. You pass in brackets with the column you want to grab. For example, the following will get you the names.

In [12]:

#To get a specific column, we index with the column name.
print(df["Name"])

0          Ray Lewis
1          Tom Brady
2        Julio Jones
3    Richard Sherman
Name: Name, dtype: object

In [13]:

#Likewise we can get height
print(df["Height"])

0    6'1
1    6'4
2    6'3
3    6'3
Name: Height, dtype: object

If we give a nested list where the first element is a list of names, we can select multiple columns. This will give us back a dataframe which is just with the columns we are looking for.

In [14]:

#Grab the name and weight columns
print(df[["Name","Weight"]])

              Name Weight
0        Ray Lewis    250
1        Tom Brady    225
2      Julio Jones    220
3  Richard Sherman    194

By indexing into a column but then passing values we can either overwrite or create a new column. Below we are able to set the retired column.

In [15]:

#We can also assign a new column by the above method if we give a list of equal length as the dataframes rows
df["Retired"] = [True,False,False,False]
print(df)

              Name Height Weight     Type  Retired
0        Ray Lewis    6'1    250  Defense     True
1        Tom Brady    6'4    225  Offense    False
2      Julio Jones    6'3    220  Offense    False
3  Richard Sherman    6'3    194  Defense    False

Data Science

Data Science

Pandas DataFrames

Pandas DataFrames¶

Leave A Reply Cancel reply

Modal title