-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 4
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
-
Location Groups 3
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
-
Creating the New Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Mapping the Data 3
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Building the Grid
Solution
def LocationGroup(x,maxNum,minNum,slice_constant):
if x==maxNum:
loc = int(slice_constant*(x-minNum)/(maxNum-minNum))-1
else:
loc = int(slice_constant*(x-minNum)/(maxNum-minNum))
return loc
We have (x-minNum)/(maxNum-minNum) to return the percentile, and then multiply it by the slice constant because that is how many groups we are creating. We use int to get rid of decimal. Finally, we need the if statement because at the maximum we would get what our slice constant equals, but we really want 0 to slice constant-1.
Now, we have another issue to deal with, there are blank spaces in some of the datasets. To deal with this, we want to make the dataset only have values that are filled in. Let’s first learn how to turn a column into a truth series. If we wanted to see which columns of latitude are greater than the average, we could do this:
print(Assess_Combined["LATITUDE"]>((a+b)/2))
This will return a column of true and false values based on the equation. Let’s say you wanted only the the values that were above the average, you would index the dataframe with your truth series and it would return the dataframe without any false values.
print(len(Assess_Combined))
print(len(Assess_Combined[Assess_Combined["LATITUDE"]>((a+b)/2)]))
print(Assess_Combined[Assess_Combined["LATITUDE"]>((a+b)/2)])
To get the truth series of whether a column is or is not null we can do this:
print(Assess_Combined['LATITUDE'].notnull())
And to get our data without the null values what we do is index with this truth series.
Assess_Combined = Assess_Combined[Assess_Combined['LATITUDE'].notnull()]
And now time to apply our function. By using the function dataframe.apply(), you can apply any function to a pandas dataframe or column. There’s one thing that we need to do first, and that is create a lambda function. We create this lambda function so that we can have the minimum, maximum and slice constant plugged in when we apply.
To get latitude, we plug in a and b.
print(Assess_Combined["LATITUDE"].apply((lambda x: LocationGroup(x,a,b,100))))
Now of course, we only printed this out, we will need to assign it to a new column which we can do easily.
Assess_Combined["Lat Group"] = Assess_Combined["LATITUDE"].apply((lambda x: LocationGroup(x,a,b,100)))
Let’s also create our longitude group.
Assess_Combined["Lon Group"] = Assess_Combined["LONGITUDE"].apply((lambda x: LocationGroup(x,c,d,100)))
Finally, let’s throw out anything that has 0 for AV_TOTAL.
Assess_Combined = Assess_Combined[Assess_Combined["AV_TOTAL"]>0]
Assess_Combined = Assess_Combined[Assess_Combined["AV_TOTAL_PRE"]>0]
Now save, and on to the next lesson.
Assess_Combined.to_csv("Assess_Combined.csv",encoding="UTF-8")
Source Code