-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Pandas DataFrames
Pandas DataFrames¶
A pandas dataframe is a structure which holds 2 dimensional data including an index and columns. You can think of it is similar to a spreadsheet. To create one, you need to pass in a list of either tuples, list of lists or (we won’t cover it here) a dictionary. For our first dataframe, let’s get the following data. Each entry of the list below is going to be 4 attributes, a name, a height, a weight and a string for offense or defense. Right now these are all strings.
#Now, let's create some data, each element corresponds to a row
#Each tuple corresponds to the data for the row
data = []
data.append(("Ray Lewis","6'1","250","Defense"))
data.append(("Tom Brady","6'4","225","Offense"))
data.append(("Julio Jones","6'3","220","Offense"))
data.append(("Richard Sherman","6'3","194","Defense"))
print(data)
[('Ray Lewis', "6'1", '250', 'Defense'), ('Tom Brady', "6'4", '225', 'Offense'), ('Julio Jones', "6'3", '220', 'Offense'), ('Richard Sherman', "6'3", '194', 'Defense')]
#We can initialize a dataframe with a list of data
df = pd.DataFrame(data)
print(df)
0 1 2 3
0 Ray Lewis 6'1 250 Defense
1 Tom Brady 6'4 225 Offense
2 Julio Jones 6'3 220 Offense
3 Richard Sherman 6'3 194 Defense
You can pass in a list of labels for either the columns or index if you want. For our example from before we can add columns.
#The columns argument lets us set the names for the columns in our dataframe
df = pd.DataFrame(data,columns=["Name","Height","Weight","Type"])
print(df)
Name Height Weight Type
0 Ray Lewis 6'1 250 Defense
1 Tom Brady 6'4 225 Offense
2 Julio Jones 6'3 220 Offense
3 Richard Sherman 6'3 194 Defense
Findinding column values can be done in the same way that you might grab values from a dictionary. You pass in brackets with the column you want to grab. For example, the following will get you the names.
#To get a specific column, we index with the column name.
print(df["Name"])
0 Ray Lewis
1 Tom Brady
2 Julio Jones
3 Richard Sherman
Name: Name, dtype: object
#Likewise we can get height
print(df["Height"])
0 6'1
1 6'4
2 6'3
3 6'3
Name: Height, dtype: object
If we give a nested list where the first element is a list of names, we can select multiple columns. This will give us back a dataframe which is just with the columns we are looking for.
#Grab the name and weight columns
print(df[["Name","Weight"]])
Name Weight
0 Ray Lewis 250
1 Tom Brady 225
2 Julio Jones 220
3 Richard Sherman 194
By indexing into a column but then passing values we can either overwrite or create a new column. Below we are able to set the retired column.
#We can also assign a new column by the above method if we give a list of equal length as the dataframes rows
df["Retired"] = [True,False,False,False]
print(df)
Name Height Weight Type Retired
0 Ray Lewis 6'1 250 Defense True
1 Tom Brady 6'4 225 Offense False
2 Julio Jones 6'3 220 Offense False
3 Richard Sherman 6'3 194 Defense False