import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
color = '#87ceeb'
# helper function to return an optionally truncated normal distribution
def normal(mu, sigma=.75, nvals=11, lbound=0, ubound=1):
dist = []
for x in np.linspace(lbound,ubound,nvals):
dist.append(np.exp(-np.power(x - mu, 2.) / (2 * np.power(sigma, 2.))))
return dist
Here, we create a vector of values that $\theta$ can take on. This is the "grid" laid down on the parameter space. It's a simple grid (only 1 dimension), but that's why what will follow is appropriately called a grid approximation. By tweaking n_theta_vals
, this grid can be course-grained (small values of n_theta_vals
) or it can be very fine-grained (large values of n_theta_vals
). Start with small values to get a bit of an understading of how the approximation works and then crank it up to approach an exact solution.
n_theta_vals = 11
#n_theta_vals = 1001
theta = np.linspace(0, 1, n_theta_vals)
p_theta
is the vector of prior probabilities, each associated with the corresponding value of $\theta$ that we just created. There are several alternative priors specified here:
Try out each of them. Tweak the mean and standard deviation of the normal distribution to see how they influence the posterior.
# uniform prior
p_theta = np.ones_like(theta)
# triangle prior
#p_theta = np.minimum(theta, 1-theta)
# (truncated) normal priors
# wide prior
#p_theta = normal(.5, 10, n_theta_vals)
# narrow prior
#p_theta = normal(.5, .04, n_theta_vals)
# normalize the vector so that priors sum to 1 (making them proper probabilities)
p_theta = p_theta / np.sum(p_theta)
This constructs a set of flip outcomes. Specify the number of heads (i.e., n_heads
) and the number of tails (i.e., n_tails
). There are three scenarios prepared:
# example 1
n_heads = 1
n_tails = 0
# example 2
#n_heads = 1
#n_tails = 3
# example 3
#n_heads = 10
#n_tails = 30
data = np.repeat([1, 0], [n_heads, n_tails])
Here, we calculate the likelihood of the data we just constructed (i.e., $p(data|\theta)$). We will use a Bernoulli distribution and apply it to each of the values of $\theta$ we are entertaining. We then multiply each likelihood by the prior associated with the corresponding value of $\theta$ and sum all of these values to arrive as what is sometimes called the evidence (i.e., $p(data)$, or the denominator of Bayes' rule).
p_data_given_theta = (theta**n_heads) * ( (1-theta)**n_tails )
# calculate the evidence (P(D), the prior probability of the data)
p_data = np.sum(p_data_given_theta * p_theta)
Here, we put it all together, by applying Bayes' rule to each prior/likelihood pair to generate a corresponding posterior for each value of $\theta$ we are entertaining.
p_theta_given_data = p_data_given_theta * p_theta / p_data
Plot the prior, the likelihood, and the posterior.
fig = plt.figure(figsize=(8, 8))
plt.subplots_adjust(hspace = 1.1)
# Plot the prior, the likelihood, and the posterior:
for i,dist in enumerate([p_theta, p_data_given_theta, p_theta_given_data]):
plt.subplot(3, 1, i+1)
markerline, stemlines, baseline = plt.stem(theta, dist, markerfmt=' ', basefmt=' ', use_line_collection=True)
plt.setp(stemlines, 'linewidth', 3)
plt.xlim(0, 1)
plt.xlabel('$\\theta$')
# prior
plt.axes(fig.axes[0])
plt.xlabel(r'$\theta$')
plt.ylabel('$P(\\theta)$')
plt.title('Prior', weight='bold')
# likelihood
plt.axes(fig.axes[1])
plt.xlabel(r'$\theta$')
plt.ylabel('$P(D|\\theta)$')
plt.title('Likelihood', weight='bold')
plt.text(0.1, np.max(p_data_given_theta)/2, 'D = %sH,%sT' % (n_heads, n_tails))
# posterior
plt.axes(fig.axes[2])
plt.xlabel(r'$\theta$')
plt.ylabel('$P(\\theta|D)$')
plt.title('Posterior', weight='bold')
plt.text(0.1, np.max(p_theta_given_data)/2, 'P(D) = %g' % p_data)
plt.show()