-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Apply
Apply¶
By using apply you can use apply a function to each element of a column or every row/every column. To begin with let’s define a function which will deal with the height column. The issue with height is that it is a string representation right now and we might want an actual numeric representation. Let’s do it piece by piece. Start with the height of the first player.
#Get the first player's height
height = df.loc[0, "Height"]
#Print the value and the type
print(height)
print(type(height))
6'1
<class 'str'>
The first thing to do would be to grab the two pieces of information that this string provides, the feet and inches. If we use split and split on "'", we can grab the two pieces as a list.
#Split in two
height = height.split("'")
print(height)
['6', '1']
We still have strings in both of these cases, so we are going to want to convert them both to integers.
#Convert to integers
height = [int(x) for x in height]
print(height)
[6, 1]
Now the value on the left is the number of feet, the number on the right is the number of inches. Let's convert the left number to be the number of inches.
#Convert the feet to inches
height[0] = height[0] * 12
print(height)
[72, 1]
Finally, sum these two numbers and we have the height in terms of total inches.
#Get the total height in inches
height = sum(height)
print(height)
73
Now that we know all the steps, we can convert this into a function.
def height_convert(height):
#Split in two
height = height.split("'")
#Convert to integers
height = [int(x) for x in height]
#Convert the feet to inches
height[0] = height[0] * 12
#Get the total height in inches
height = sum(height)
return height
print(height_convert("6'1"))
73
When using apply on a column, you index into the column, call apply, then pass along the function which you want to apply. This assumes that only one argument is needed. This will return a new series where the values are the returned values from the function applied to every value.
#Apply our new function
print(df["Height"].apply(height_convert))
0 73
1 76
2 75
3 75
Name: Height, dtype: int64
Let's overwrite our prior height column with this.
#Modify the values
df["Height"] = df["Height"].apply(height_convert)
print(df)
Name Height Weight Type Retired
0 Ray Lewis 73 250 Defense True
1 Tom Brady 76 225 Offense False
2 Julio Jones 75 220 Offense False
3 Richard Sherman 75 194 Defense False
You might want to see weight divided by height but trying that will cause an error because height is in string representation.
#If we try to divide here we will run into an issue, weight is still a string representation
print(df["Weight"]/df["Height"])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, str_rep)
148 try:
--> 149 result = expressions.evaluate(op, str_rep, left, right)
150 except TypeError:
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/computation/expressions.py in evaluate(op, op_str, a, b, use_numexpr)
207 if use_numexpr:
--> 208 return _evaluate(op, op_str, a, b)
209 return _evaluate_standard(op, op_str, a, b)
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/computation/expressions.py in _evaluate_standard(op, op_str, a, b)
69 with np.errstate(all="ignore"):
---> 70 return op(a, b)
71
TypeError: unsupported operand type(s) for /: 'str' and 'int'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-24-7d5eb8d3736d> in <module>
1 #If we try to divide here we will run into an issue, weight is still a string representation
----> 2 print(df["Weight"]/df["Height"])
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other)
62 other = item_from_zerodim(other)
63
---> 64 return method(self, other)
65
66 return new_method
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
501 lvalues = extract_array(left, extract_numpy=True)
502 rvalues = extract_array(right, extract_numpy=True)
--> 503 result = arithmetic_op(lvalues, rvalues, op, str_rep)
504
505 return _construct_result(left, result, index=left.index, name=res_name)
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op, str_rep)
195 else:
196 with np.errstate(all="ignore"):
--> 197 res_values = na_arithmetic_op(lvalues, rvalues, op, str_rep)
198
199 return res_values
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, str_rep)
149 result = expressions.evaluate(op, str_rep, left, right)
150 except TypeError:
--> 151 result = masked_arith_op(left, right, op)
152
153 return missing.dispatch_fill_zeros(op, left, right, result)
~/anaconda/envs/FinanceAndPython/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in masked_arith_op(x, y, op)
92 if mask.any():
93 with np.errstate(all="ignore"):
---> 94 result[mask] = op(xrav[mask], yrav[mask])
95
96 else:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
To see what types we have for each column, we can call dtypes.
#The dtypes attribute lets us see the attribute
print(df.dtypes)
Name object
Height int64
Weight object
Type object
Retired bool
dtype: object
For conversion to numeric values, there are two options. One is that we can say astype() and pass in a type. If we pass in integer than we can convert a column to integer.
#Convert to integer
df["Weight"] = df["Weight"].astype(int)
print(df.dtypes)
Name object
Height int64
Weight int64
Type object
Retired bool
dtype: object
The other option is to call pd.to_numeric and pass in the series. If these were floating point numbers, it would automatically convert to that instead, but in this case they are all integers.
#Also converts to numeric values
df["Weight"] = pd.to_numeric(df["Weight"])
print(df.dtypes)
Name object
Height int64
Weight int64
Type object
Retired bool
dtype: object
Now that we have numeric types we can do mathematical operations with the columns. To divide weight by height, we can do this below....
#Find the ratio of weight and height
print(df["Weight"]/df["Height"])
0 3.424658
1 2.960526
2 2.933333
3 2.586667
dtype: float64