-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Null Values
Working with Null Values¶
When you come across null values there are a few options of what you can do. One option is to simply drop any rows which have null values. The function dropna() achieves this by going through every row and ensuring there are no null values. It will not do it in place but rather returns a new version of the dataframe.
#The dropna() function gets rid of any rows with missing values
print(df_final.dropna())
Name Height Weight Type Retired
0 Ray Lewis 73.0 250.0 Defense True
1 Tom Brady 76.0 225.0 Offense False
2 Julio Jones 75.0 220.0 Offense False
3 Richard Sherman 75.0 194.0 Defense False
4 Allen Robinson 75.0 250.0 Offense False
If there are only some columns that you want to consider when dropping null values you can give the keyword subset which will only consider dropping rows with null values in these columns.
#If given an argument subset, we can drop only rows with missing values from the subset
print(df_final.dropna(subset=["Height","Type"]))
Name Height Weight Type Retired
0 Ray Lewis 73.0 250.0 Defense True
1 Tom Brady 76.0 225.0 Offense False
2 Julio Jones 75.0 220.0 Offense False
3 Richard Sherman 75.0 194.0 Defense False
4 Allen Robinson 75.0 250.0 Offense False
6 Christian McCaffrey 71.0 NaN Offense False
If you do not want to return a dataframe but want to instead modify the one you are calling these functions on you are free to use the argument inplace=True. This will make it so that the modification happens to the object you passed.
#If we use inplace=True then the dropna function happens in place
df_final.dropna(inplace=True,subset=["Height","Type"])
print(df_final)
Name Height Weight Type Retired
0 Ray Lewis 73.0 250.0 Defense True
1 Tom Brady 76.0 225.0 Offense False
2 Julio Jones 75.0 220.0 Offense False
3 Richard Sherman 75.0 194.0 Defense False
4 Allen Robinson 75.0 250.0 Offense False
6 Christian McCaffrey 71.0 NaN Offense False
We may also prefer to use a more descriptive index like the name of the player to refer to the different rows. Calling set_index will return a new dataframe with the index set to whatever column you passed it.
#Setting the index to name changes it from a column to index
df_final = df_final.set_index("Name")
print(df_final)
Height Weight Type Retired
Name
Ray Lewis 73.0 250.0 Defense True
Tom Brady 76.0 225.0 Offense False
Julio Jones 75.0 220.0 Offense False
Richard Sherman 75.0 194.0 Defense False
Allen Robinson 75.0 250.0 Offense False
Christian McCaffrey 71.0 NaN Offense False
If there is new data that has come to light, you have the option of overwriting data using loc. For example, we can overwrite the row for Christian McCaffrey and the weight value in that row by executing the code below.
#Using loc we can overwrite data
df_final.loc["Christian McCaffrey","Weight"] = 205
print(df_final)
Height Weight Type Retired
Name
Ray Lewis 73.0 250.0 Defense True
Tom Brady 76.0 225.0 Offense False
Julio Jones 75.0 220.0 Offense False
Richard Sherman 75.0 194.0 Defense False
Allen Robinson 75.0 250.0 Offense False
Christian McCaffrey 71.0 205.0 Offense False