PDF and CDF

The probability denisty function – pdf(x) is a function which describes to us how likely a point would be randomly placed at x. This gets complicated when we consider continuous functions since there are an infinite number of x’s we could have (we could have .5 or we could have .49 or we could have .499 for example).

The cummulative density function – cdf(x) says what percent of a distribution is expected to be below a point x. So if cdf(2) is .3 then we expect 30% of the distribution’s points to be below 2.

Scipy has a library which allows us to create a distribution object. For example to create a normal distribution, we would use scipy.stats.norm(), and feed it a mean of 400 and a standard deviation of 100.

import scipy.stats
dist = scipy.stats.norm(400,100)

We can use the .pdf() function to get the pdf at a point for distributions.

print(dist.pdf(400))

Let’s plot the pdf over the range 0 to 800.

import matplotlib.pyplot as plt
xVals = [x*10 for x in range(0,81)]
yVals = []
for x in xVals:
    yVals.append(dist.pdf(x))
plt.plot(xVals,yVals)
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()

What about the cdf?

import matplotlib.pyplot as plt
xVals = [x*10 for x in range(0,81)]
yVals = []
for x in xVals:
    yVals.append(dist.cdf(x))
plt.plot(xVals,yVals)
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()

Notice at the farthest right point we have approach 1 meaning all data is less than the far right.

Challenge

Create a normal distribution that has the same standard deviation, but a mean 200 higher than the first one. Next plot the two pdfs and cdfs together.

Basics

Statistics

PDF and CDF

Challenge

Leave A Reply Cancel reply

Modal title