Null Values

Working with Null Values¶

When you come across null values there are a few options of what you can do. One option is to simply drop any rows which have null values. The function dropna() achieves this by going through every row and ensuring there are no null values. It will not do it in place but rather returns a new version of the dataframe.

In [38]:

#The dropna() function gets rid of any rows with missing values
print(df_final.dropna())

              Name  Height  Weight     Type  Retired
0        Ray Lewis    73.0   250.0  Defense     True
1        Tom Brady    76.0   225.0  Offense    False
2      Julio Jones    75.0   220.0  Offense    False
3  Richard Sherman    75.0   194.0  Defense    False
4   Allen Robinson    75.0   250.0  Offense    False

If there are only some columns that you want to consider when dropping null values you can give the keyword subset which will only consider dropping rows with null values in these columns.

In [39]:

#If given an argument subset, we can drop only rows with missing values from the subset 
print(df_final.dropna(subset=["Height","Type"]))

                  Name  Height  Weight     Type  Retired
0            Ray Lewis    73.0   250.0  Defense     True
1            Tom Brady    76.0   225.0  Offense    False
2          Julio Jones    75.0   220.0  Offense    False
3      Richard Sherman    75.0   194.0  Defense    False
4       Allen Robinson    75.0   250.0  Offense    False
6  Christian McCaffrey    71.0     NaN  Offense    False

If you do not want to return a dataframe but want to instead modify the one you are calling these functions on you are free to use the argument inplace=True. This will make it so that the modification happens to the object you passed.

In [40]:

#If we use inplace=True then the dropna function happens in place
df_final.dropna(inplace=True,subset=["Height","Type"])
print(df_final)

                  Name  Height  Weight     Type  Retired
0            Ray Lewis    73.0   250.0  Defense     True
1            Tom Brady    76.0   225.0  Offense    False
2          Julio Jones    75.0   220.0  Offense    False
3      Richard Sherman    75.0   194.0  Defense    False
4       Allen Robinson    75.0   250.0  Offense    False
6  Christian McCaffrey    71.0     NaN  Offense    False

We may also prefer to use a more descriptive index like the name of the player to refer to the different rows. Calling set_index will return a new dataframe with the index set to whatever column you passed it.

In [41]:

#Setting the index to name changes it from a column to index
df_final = df_final.set_index("Name")
print(df_final)

                     Height  Weight     Type  Retired
Name
Ray Lewis              73.0   250.0  Defense     True
Tom Brady              76.0   225.0  Offense    False
Julio Jones            75.0   220.0  Offense    False
Richard Sherman        75.0   194.0  Defense    False
Allen Robinson         75.0   250.0  Offense    False
Christian McCaffrey    71.0     NaN  Offense    False

If there is new data that has come to light, you have the option of overwriting data using loc. For example, we can overwrite the row for Christian McCaffrey and the weight value in that row by executing the code below.

In [42]:

#Using loc we can overwrite data
df_final.loc["Christian McCaffrey","Weight"] = 205
print(df_final)

                     Height  Weight     Type  Retired
Name
Ray Lewis              73.0   250.0  Defense     True
Tom Brady              76.0   225.0  Offense    False
Julio Jones            75.0   220.0  Offense    False
Richard Sherman        75.0   194.0  Defense    False
Allen Robinson         75.0   250.0  Offense    False
Christian McCaffrey    71.0   205.0  Offense    False

Data Science

Data Science

Null Values

Working with Null Values¶

Leave A Reply Cancel reply

Modal title