PDF and CDF
The probability denisty function – pdf(x) is a function which describes to us how likely a point would be randomly placed at x. This gets complicated when we consider continuous functions since there are an infinite number of x’s we could have (we could have .5 or we could have .49 or we could have .499 for example).
The cummulative density function – cdf(x) says what percent of a distribution is expected to be below a point x. So if cdf(2) is .3 then we expect 30% of the distribution’s points to be below 2.
Scipy has a library which allows us to create a distribution object. For example to create a normal distribution, we would use scipy.stats.norm(), and feed it a mean of 400 and a standard deviation of 100.
import scipy.stats
dist = scipy.stats.norm(400,100)
We can use the .pdf() function to get the pdf at a point for distributions.
print(dist.pdf(400))
Let’s plot the pdf over the range 0 to 800.
import matplotlib.pyplot as plt
xVals = [x*10 for x in range(0,81)]
yVals = []
for x in xVals:
yVals.append(dist.pdf(x))
plt.plot(xVals,yVals)
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()
What about the cdf?
import matplotlib.pyplot as plt
xVals = [x*10 for x in range(0,81)]
yVals = []
for x in xVals:
yVals.append(dist.cdf(x))
plt.plot(xVals,yVals)
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()
Notice at the farthest right point we have approach 1 meaning all data is less than the far right.
Challenge