PDF and CDF Part 2

Solution

dist2 = scipy.stats.norm(600,100)
xVals = list(range(1000))
yVals1 = [dist.pdf(x) for x in xVals]
yVals2 = [dist2.pdf(x) for x in xVals]

plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()

dist2 = scipy.stats.norm(600,100)
xVals = list(range(1000))
yVals1 = [dist.cdf(x) for x in xVals]
yVals2 = [dist2.cdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Cummulative Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()

What if instead of a shifted mean, we had a standard deviation that was half of the original.

dist2 = scipy.stats.norm(400,50)
xVals = list(range(1000))
yVals1 = [dist.pdf(x) for x in xVals]
yVals2 = [dist2.pdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()

dist2 = scipy.stats.norm(400,50)
xVals = list(range(1000))
yVals1 = [dist.cdf(x) for x in xVals]
yVals2 = [dist2.cdf(x) for x in xVals]
plt.plot(xVals,yVals1)
plt.plot(xVals,yVals2)
plt.xlabel("Value")
plt.ylabel("Cummulative Density")
plt.legend(["Distribution 1","Distribution 2"])
plt.show()

Moving on, a helpful rule is the 68-95-99.7 rule, it says that for a normal curve 68% of numbers will be within one standard deviation from the mean, 95% will be within two standard deviations and 99.7% will be within 3 standard deviations.

The z score is a measure that represents how many standard deviations away from the mean a point is.

Equation

$$z = \frac{x-\overline{x}}{\sigma}$$

The rvs function of the distribution allows us to sample random points, let’s do that now, and also get the mean and standard deviation to check it.

import numpy as np
sample = dist.rvs(size=10000)
print(np.mean(sample))
print(np.std(sample))

Let’s get a histogram of the data. You’ll notice because it is a sample it is not a perfect representation.

plt.hist(sample,bins=100)
plt.show()

We can get an array of true and falses by applying logic to the entire array, this will be very important soon.

print(sample)
print(sample>470)

Notice it is true for any values above 470, false for any below. We can also index by this, when we index it will return on the values that are true.

print(sample[sample>470])

Let’s convert our sample into z-scores.

zScores = (sample-np.mean(sample))/np.std(sample)
print(zScores)

Challenge

Let’s see how well the 68-95-99.7 rule holds up. Hint: You will want to use the absolute value function abs().

Basics

Statistics

PDF and CDF Part 2

Solution

Equation

Challenge

Leave A Reply Cancel reply

Modal title