Understanding the Gamma Distribution.
The Gamma Distribution is the distribution describing the time it takes to for the n’th event to happen. Sound familiar? The exponential distribution describes the time until the the very first time an event occurs. More technically, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.
In fact you could say the exponential distribution is the gamma distribution for the case n = 1. The PDF for the distribution looks like this:
Now that is a lot of symbols. Here is a key -
α = The number of events for which you are waiting.
ß = The poisson rate parameter. The rate of events following the poisson distribution. You may remember this as λ.
For integers this looks very familiar -
The gamma function is a generalization of the factorial to the real numbers (and even complex numbers).
Of course, how do we get from the definition of the Gamma distribution to its PDF? We start by looking at the CDF of the function —
A good sanity check is that if we plug in α = 1. The equation becomes
Which is the pdf for the exponential distribution. Now we know what the distribution looks like and that it is parametrized by the number of events we are looking for and also rate of the poisson process we are sampling. If you have previously encountered this distribution you may remember the distribution having a shape and scale parameter.
The shape parameter is the same as the number of events parameter. So α is the shape. The scale parameter is the inverse of the rate parameter. So if the average rate of occurrences is modeled by the ß parameter, the scale parameter models the mean time between said occurrences. That means for θ = scale and shape = k, the pdf looks like:
Both of these are similar and the differences are up to convenience. I think the rate/number of events parametrization is more intuitive so thats the one I am going to be using.
So what does the gamma distribution look like? Here is some python code to generate it. I will use scipy.stats which conveniently has the gamma distribution already, do note that the parameters it asks for are shape and scale, so you want to put in 1/β for scale.
def plot_gamma_alpha():
"""
α : number of events
β : poisson rate parameter.
"""
x = np.linspace(0, 25, 500)
α = 1
mean, var, skew, kurt = gamma.stats(α, moments='mvsk')
y1 = gamma.pdf(x, α)
α = 5
mean, var, skew, kurt = gamma.stats(α, moments='mvsk')
y2 = gamma.pdf(x, α)
α = 10
mean, var, skew, kurt = gamma.stats(α, moments='mvsk')
y3 = gamma.pdf(x, α)
plt.title("PDF of Gamma Distribution")
plt.xlabel("X")
plt.ylabel("Probability Density")
plt.plot(x, y1, label="α = 1", color='red')
plt.plot(x, y2, label="α = 5", color='orange')
plt.plot(x, y3, label="α = 10", color='blue')
plt.legend(bbox_to_anchor=(1, 1), loc='upper right',
borderaxespad=1, fontsize=12)
plt.ylim([0, 0.40])
plt.xlim([0, 20])
plt.show()
This will generate the following figure:
We see that as the number of events we are looking for goes up, the average amount of time we have to wait goes up. In fact the expected value is exactly a/β.
Similarly we vary the rate -
def plot_gamma_beta():
α = 10
x = np.linspace(0, 50, 1000)
β = 1
mean, var, skew, kurt = gamma.stats(α, scale=1/β, moments='mvsk')
y1 = gamma.pdf(x, α, scale=1/β)
β = 2
mean, var, skew, kurt = gamma.stats(α, scale=1/β, moments='mvsk')
y2 = gamma.pdf(x, α, scale=1/β)
β = 3
mean, var, skew, kurt = gamma.stats(α, scale=1/β, moments='mvsk')
y3 = gamma.pdf(x, α, scale=1/β)
plt.title("PDF of Gamma Distribution (α = 10)")
plt.xlabel("T") plt.ylabel("Probability Density")
plt.plot(x, y1, label="β = 1", color='blue')
plt.plot(x, y2, label="β = 2", color='orange')
plt.plot(x, y3, label="β = 3", color='red')
plt.legend(bbox_to_anchor=(1, 1), loc='upper right',
borderaxespad=1, fontsize=12)
plt.ylim([0, 0.40])
plt.xlim([0, 20])
plt.show()
And we get the following figure —
So the more often an event occurs the less one has to wait till they get their required amount of events.
Let us try and run an experiment. To simulate a poisson process, I am going to use the inverse CDF trick on the exponential CDF to get time between events.
Let us first set our parameters:
α = 5 # alpha
n = 100 # number of points we want in our poisson process
event_num = [] # An array of indices to plot our process
event_times = [] # An array where we will put time of event occurring
event_time = 0
We repeat the process n times:
for i in range(n):
Now first we sample from the uniform distribution:
event_num.append(i)
n = random.random()
Then we generate the inter-event time from the inverse of the exponential CDF and add it to the event time:
inter_event_time = -math.log(1.0 - n) / β
inter_event_times.append(inter_event_time)
event_time = event_time + inter_event_time
event_times.append(event_time)
Then we plot-
plt.scatter(event_num, event_times, s =2)
Now if we wrap all of this up and return the time of the n’th event, we have ourselves a pretty fair simulation of a gamma event. The following code is pretty rough and could definitely be made more efficient (You could stop the experiment once you hit the required number of event occurrences for example.)
import random
import math
import statistics
import matplotlib.pyplot as pltdef simulate(α, β):
β = β
n = 100
inter_event_times = []
event_times = []
event_time = 0 for i in range(n):
n = random.random()
inter_event_time = -math.log(1.0 - n) / β
event_time = event_time + inter_event_time
event_times.append(event_time)
return(event_times[α])
Now to see if this works let us plot from the gamma distribution in scipy for α = 10, β = 2.
α = 10
x = np.linspace(0, 50, 1000)
β = 2
mean, var, skew, kurt = gamma.stats(α, scale=1/β, moments='mvsk')
y2 = gamma.pdf(x, α, scale=1/β)
plt.title("PDF of Gamma Distribution (α = 10)")
plt.xlabel("T")
plt.ylabel("Probability Density")
#plt.hist(hist_values, bins = 100, normalize = True)
plt.plot(x, y2, label="β = 2", color='orange')
plt.xlim([0,15])
plt.show
We get the following figure:
And now from our simulation, to get a plot we run the simulation function about 100,000 times and then plot a histogram.
hist_values = [simulate(10, 2) for _ in range(100000)]
plt.hist(hist_values, bins = 100)
plt.show
And the shape of these graphs looks pretty similar! We have successfully modeled a gamma process.
Uses in real life:
Since the gamma distribution is a generalization of the exponential distribution it is technically used in every application of the exponential distribution -
- Wait time modeling
- Reliability modeling
But special cases where we use gamma processes are
- Aggregate insurance claims
- Size of rainfalls
- The age distribution of cancer incidence often follows the gamma distribution