Apply
Apply¶
By using apply you can use apply a function to each element of a column or every row/every column. To begin with let’s define a function which will deal with the height column. The issue with height is that it is a string representation right now and we might want an actual numeric representation. Let’s do it piece by piece. Start with the height of the first player.
#Get the first player's height
height = df.loc[0, "Height"]
#Print the value and the type
print(height)
print(type(height))
The first thing to do would be to grab the two pieces of information that this string provides, the feet and inches. If we use split and split on "'", we can grab the two pieces as a list.
#Split in two
height = height.split("'")
print(height)
We still have strings in both of these cases, so we are going to want to convert them both to integers.
#Convert to integers
height = [int(x) for x in height]
print(height)
Now the value on the left is the number of feet, the number on the right is the number of inches. Let's convert the left number to be the number of inches.
#Convert the feet to inches
height[0] = height[0] * 12
print(height)
Finally, sum these two numbers and we have the height in terms of total inches.
#Get the total height in inches
height = sum(height)
print(height)
Now that we know all the steps, we can convert this into a function.
def height_convert(height):
#Split in two
height = height.split("'")
#Convert to integers
height = [int(x) for x in height]
#Convert the feet to inches
height[0] = height[0] * 12
#Get the total height in inches
height = sum(height)
return height
print(height_convert("6'1"))
When using apply on a column, you index into the column, call apply, then pass along the function which you want to apply. This assumes that only one argument is needed. This will return a new series where the values are the returned values from the function applied to every value.
#Apply our new function
print(df["Height"].apply(height_convert))
Let's overwrite our prior height column with this.
#Modify the values
df["Height"] = df["Height"].apply(height_convert)
print(df)
You might want to see weight divided by height but trying that will cause an error because height is in string representation.
#If we try to divide here we will run into an issue, weight is still a string representation
print(df["Weight"]/df["Height"])
To see what types we have for each column, we can call dtypes.
#The dtypes attribute lets us see the attribute
print(df.dtypes)
For conversion to numeric values, there are two options. One is that we can say astype() and pass in a type. If we pass in integer than we can convert a column to integer.
#Convert to integer
df["Weight"] = df["Weight"].astype(int)
print(df.dtypes)
The other option is to call pd.to_numeric and pass in the series. If these were floating point numbers, it would automatically convert to that instead, but in this case they are all integers.
#Also converts to numeric values
df["Weight"] = pd.to_numeric(df["Weight"])
print(df.dtypes)
Now that we have numeric types we can do mathematical operations with the columns. To divide weight by height, we can do this below....
#Find the ratio of weight and height
print(df["Weight"]/df["Height"])