Exponential Distribution
Exponential Distribution¶
The exponential distribution has the following probability distribution function:
$ \lambda e^{-\lambda x} \text{ for x >= 0}$
The cummulative distribution is equal to:
$ P(X <= x) = 1 – e^{-\lambda x}$
#Compare three different values of lambda for a distribution
X = np.linspace(0,5,1001, endpoint=True)
lambda1 = 1
lambda2 = .5
lambda3 = 2
pdf1 = lambda1 * np.exp(X * -lambda1)
pdf2 = lambda2 * np.exp(X * -lambda2)
pdf3 = lambda3 * np.exp(X * -lambda3)
cdf1 = 1 - np.exp(X * -lambda1)
cdf2 = 1 - np.exp(X * -lambda2)
cdf3 = 1 - np.exp(X * -lambda3)
#Plot the PDFs
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(X, pdf1)
ax.plot(X, pdf2)
ax.plot(X, pdf3)
ax.set_xlabel("X")
ax.set_ylabel("PDF")
ax.set_title("Exponential Distribution PDFs")
plt.legend(["Lambda={}".format(x) for x in [lambda1, lambda2, lambda3]])
plt.show()
#Plot the CDFs
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(X, cdf1)
ax.plot(X, cdf2)
ax.plot(X, cdf3)
ax.set_xlabel("X")
ax.set_ylabel("CDF")
ax.set_title("Exponential Distribution CDFs")
ax.yaxis.set_major_formatter(PercentFormatter(1))
plt.legend(["Lambda={}".format(x) for x in [lambda1, lambda2, lambda3]])
plt.show()
Taking the points which we already have from the actual CDF curve, we can try to find what value of lambda would be implied. Let's see an example first to understand the intuition. Say that we know the CDF at X=1.5 is .9502, what would the implied lambda be if it were coming from an exponential distribution?
First, set up the problem:
$ P(X <= x) = 1 - e^{-\lambda X}$
$ P(X <= 1.5) = 1 - e^{- 1.5\lambda}$
$ .9502 = 1 - e^{- 1.5\lambda}$
$ .0498 = e^{- 1.5\lambda}$
$ ln(.0498) = ln(e^{- 1.5\lambda})$
$ -3 = - 1.5\lambda$
$ \lambda = 2$
If we want to generalize this to finding it for problems:
$$ \lambda = \frac{-ln(1 - CDF)}{x} $$
For this analysis, we don't want to analyze the end values (the value of 0 or the value of 100) so we are going to index 1:-1 meaning everything except the ends.
#Go through all the points we have
for mult, cdf in zip(multiples[1:-1], actual_cdf[1:-1]):
#Calculate what the implied lambda would have to be
implied_lambda = -np.log(1-cdf)/mult
#Print the information
print("x: {}, CDF: {}, lambda: {}".format(mult, cdf, implied_lambda))
x: 1, CDF: 0.648, lambda: 1.04412410338404
x: 5, CDF: 0.901, lambda: 0.4625270857695095
x: 10, CDF: 0.96, lambda: 0.32188758248682
x: 20, CDF: 0.985, lambda: 0.2099852538939963
x: 50, CDF: 0.996, lambda: 0.1104292183572449
#Do a quick check to make sure the formula works with x: 5, CDF: 0.901, lambda: 0.4625270857695095
print(1-np.exp(-0.4625270857695095 * 5))
0.901
The way taht the implied lambda keeps going down allows us to see that the exponential distribution as it is right now will not be useful. However, something we can do is a log transformation. A log transformation can be used to modify the data in a way that makes it more predictable, and then we can transform it back after the prediction is done. Look at how the curve looks if we change the x values to be ln(x) instead of x. We will create the log transformed x variables by doing $ln(1+x)$.
#Do the transformation
log_multiples = np.log([x+1 for x in multiples])
print(log_multiples)
[0. 0.69314718 1.79175947 2.39789527 3.04452244 3.93182563
4.61512052]
#Now it looks much more like an exponential distribution
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(log_multiples, actual_cdf,marker='o')
ax.set_xlabel("Multiple")
ax.set_ylabel("Percent")
ax.set_title("Exit Multiples CDF")
ax.yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
The formula is modified to be the following with the log transformation:
$$ \lambda = \frac{-ln(1 - CDF)}{ln(x+1)} $$
#If we transform the x variable with a log transformation, we get much closer values of lambda
for mult, cdf in zip(multiples[1:-1], actual_cdf[1:-1]):
log_mult = np.log(mult+1)
implied_lambda = -np.log(1-cdf)/log_mult
print("x: {}, CDF: {}, lambda: {}".format(mult, cdf, implied_lambda))
x: 1, CDF: 0.648, lambda: 1.50635266602479
x: 5, CDF: 0.901, lambda: 1.2907064081787172
x: 10, CDF: 0.96, lambda: 1.3423754829424788
x: 20, CDF: 0.985, lambda: 1.3794298330152248
x: 50, CDF: 0.996, lambda: 1.404299537575494
Let's use a simple heuristic to decide on the value to use. To make things simple, we are going to collect the average implied lambdas and then take the average of this.
lambdas = []
for mult, cdf in zip(multiples[1:-1], actual_cdf[1:-1]):
lambdas.append(-np.log(1-cdf)/np.log(mult+1))
print(lambdas)
print()
lambda_log = np.mean(lambdas)
print(lambda_log)
[1.50635266602479, 1.2907064081787172, 1.3423754829424788, 1.3794298330152248, 1.404299537575494]
1.384632785547341
Now, we can predict the cdf for each multiple, but remember that we need to do a log transform first!
cdf_pred = [1-np.exp(-lambda_log * np.log(x+1)) for x in multiples]
print(cdf_pred)
[0.0, 0.6170130300555561, 0.9163345277827373, 0.96385455752812, 0.9852357355713208, 0.9956784220539311, 0.9983221585734632]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(multiples, actual_cdf,marker='o')
ax.plot(multiples, cdf_pred)
ax.set_xlabel("Multiple")
ax.set_ylabel("Percent CDF")
ax.set_title("Exponential Distribution (Log Transformed) CDF")
ax.yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
You will notice that it fits the curve very well! The next thing we want to do is create a function that does the inverse, so that given a value in the CDF, we can get the value for x. The equation that we have been using is:
$ \lambda = \frac{-ln(1 - CDF)}{ln(x+1)} $
which we can re-arrange to be:
$ ln(x+1) = \frac{-ln(1 - CDF)}{\lambda } $
Taking the exponential of each side....
$ e^{ln(x+1)} = e^{\frac{-ln(1 - CDF)}{\lambda }} $
$ x+1 = e^{\frac{-ln(1 - CDF)}{\lambda }} $
$ x = e^{\frac{-ln(1 - CDF)}{\lambda }} -1$
This is our inverse transformation function!
#Build the function and try it out with a cdf value
cdf_val = 0.96385455752812
def inverse_transform(cdf_val, lambda_log):
return np.exp(np.log(-(cdf_val-1)) / -lambda_log) - 1
inverse_transform(cdf_val, lambda_log)
10.000000000000002
We can use this to now simulate random draws like before!
np.random.seed(0)
returns1 = []
returns2 = []
returns3 = []
for _ in range(10000):
cdf_vals = np.random.uniform(0,1, 10)
valuations = [inverse_transform(x, lambda_log) for x in cdf_vals]
returns1.append(np.mean(valuations) ** (1/8) - 1)
cdf_vals = np.random.uniform(0,1, 25)
valuations = [inverse_transform(x, lambda_log) for x in cdf_vals]
returns2.append(np.mean(valuations) ** (1/8) - 1)
cdf_vals = np.random.uniform(0,1, 50)
valuations = [inverse_transform(x, lambda_log) for x in cdf_vals]
returns3.append(np.mean(valuations) ** (1/8) - 1)
#Plot the 3 together
fig, axs = plt.subplots(3, 1, sharex=True, sharey=True,figsize=(5,10))
ax = axs[0]
ax.hist(returns1, bins=30)
ax.xaxis.set_major_formatter(PercentFormatter(1))
ax.set_xlabel("Fund CAGR")
ax.set_ylabel("Frequency")
ax.set_title("Exponential Simulated Funds N=10")
ax = axs[1]
ax.hist(returns2, bins=30)
ax.xaxis.set_major_formatter(PercentFormatter(1))
ax.set_xlabel("Fund CAGR")
ax.set_ylabel("Frequency")
ax.set_title("Exponential Simulated Funds N=25")
ax = axs[2]
ax.hist(returns3, bins=30)
ax.xaxis.set_major_formatter(PercentFormatter(1))
ax.set_xlabel("Fund CAGR")
ax.set_ylabel("Frequency")
ax.set_title("Exponential Simulated Funds N=50")
plt.show()
#Get the stats for the simulations
table = pd.DataFrame([[np.mean(r), np.std(r)] for r in [returns1, returns2, returns3]],
index=['N=10', 'N=25', 'N=50'],
columns=['Mean', 'STD'])
print(table)
Mean STD
N=10 0.068759 0.105827
N=25 0.089040 0.089313
N=50 0.097952 0.072393