-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Extending to 3 Dimensions
Moving to 3+ Dimensions¶
Numpy supports the ability to create arrays that are higher dimensional. Let’s start with a basic example, what if you had two products and two stores that you sell the products in leading to a a 2×2 array to hold sales information. The two columns will denote product 1 and product 2, the two rows will denote store 1 and store 2.
#Create the sales array
sales = np.array([[100, 200],
[50, 100]])
print(sales)
[[100 200]
[ 50 100]]
We can index into this array to find different things...
print("Sales for product 1: ")
print(sales[:,0])
print()
print("Sales for product 2: ")
print(sales[:,1])
print()
print("Sales at store 1: ")
print(sales[0])
print()
print("Sales at store 2: ")
print(sales[1])
print()
Sales for product 1:
[100 50]
Sales for product 2:
[200 100]
Sales at store 1:
[100 200]
Sales at store 2:
[ 50 100]
At this point, you may be thinking that this would be much easier with a pandas dataframe that will have the labels. Here is the twist though, what if we have a third dimension, the day? We saw with pandas one way of representing it, but this may not always be the most effecient way to store data. Numpy allows us to have a third dimension (or more). All we need to do is add a third level of nesting and numpy will take care of the rest. I will specify what day relates to which array to make it as clear as possible how this works. First, the three lists of sales data need to be built.
#Create sales data
sales_day1 = [[100, 200],
[50, 100]]
sales_day2 = [[140, 300],
[55, 40]]
sales_day3 = [[21, 33],
[43, 53]]
We can combine these three lists into one larger list to hold them.
#Create the larger list object
sales = [sales_day1, sales_day2, sales_day3]
print(sales)
[[[100, 200], [50, 100]], [[140, 300], [55, 40]], [[21, 33], [43, 53]]]
Finally this can be converted to a numpy array by passing it.
#Convert to a numpy array
sales = np.array(sales)
print(sales)
[[[100 200]
[ 50 100]]
[[140 300]
[ 55 40]]
[[ 21 33]
[ 43 53]]]
With our indexing, we can pick specific 2x2 matrices by passing the first index, so to get the second array...
#Get the second array
print(sales[1])
[[140 300]
[ 55 40]]
Here is where we really get the benefits of numpy, however. What if we wanted to quickly see all sales for product 1 at all stores over all time frames? We can now to indexing along the different dimensions. Our first index is going to be ":" because we want all the time, the second will be ":" because we want all the stores, and then our final index will be "0" because we want to get back only product 1.
print(sales[:,:,0])
[[100 50]
[140 55]
[ 21 43]]
In a similar way, we could get all the sales between the first and second day by switching the index to end at 2.
print(sales[:2,:,0])
[[100 50]
[140 55]]
Let's say now that we also wanted to limit this to the first store only, we could switch that second index to be 0 as well. Notice how this is going to change the shape of the array.
print(sales[:2,0,0])
[100 140]
If you needed to preserve the two dimensional shaping of the array, one method would be to do the range to 1 instead like so below.
print(sales[:2,:1,0])
[[100]
[140]]
There is however some built in functionality that can handle any sort of reshaping you want to do....