In Statistical Rethinking, 2nd ed, there is problem 15M2:
Reconsider the primate milk missing data example from the chapter. This time, assign B a distribution that is properly bounded between zero and 1. A beta distribution, for example, is a good choice."
I have been able to replicate the code from the original problem (where B used a normal distribution). However, it fails when I use a beta distribution in pymc
, even though a colleague (@MarcoGorelli) was able to do this in numpyro
.
df_primates["neocortex_prop"] = df_primates["neocortex.perc"] / 100
df_primates["k_std"] = standardize(df_primates["kcal.per.g"])
df_primates["logmass"] = np.log(df_primates["mass"])
df_primates["logmass_std"] = standardize(df_primates["logmass"])
with pm.Model() as m15_5b:
# priors
a = pm.Normal("a", 0, 0.5)
bB = pm.Normal("bB", 0, 1)
bM = pm.Normal("bM", 0, 0.5)
sigma = pm.Exponential("sigma", 1)
# obs/imputed B with beta distribution (avoid standardizing)
Bi = pm.Beta("Bi", alpha=2, beta=2, observed=df_primates["neocortex_prop"])
# linear model
mu = a + bB * (Bi - 0.67) + bM * df_primates["logmass_std"]
# likelihood
K = pm.Normal("K", mu, sigma, observed=df_primates["k_std"])
# sample
trace_m15_5b = pm.sample(draws=1000, random_seed=19, return_inferencedata=True)
Here is the error message I get:
---------------------------------------------------------------------------
SamplingError Traceback (most recent call last)
<ipython-input-27-6bcd320dd5ad> in <module>
23
24 # sample
---> 25 trace_m15_5b = pm.sample(draws=1000, random_seed=19, return_inferencedata=True)
~/opt/anaconda3/envs/stats_rethinking/lib/python3.8/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
504 if start is None:
505 start = start_
--> 506 check_start_vals(start, model)
507 except (AttributeError, NotImplementedError, tg.NullTypeGradError):
508 # gradient computation failed
~/opt/anaconda3/envs/stats_rethinking/lib/python3.8/site-packages/pymc3/util.py in check_start_vals(start, model)
233
234 if not np.all(np.isfinite(initial_eval)):
--> 235 raise SamplingError(
236 "Initial evaluation of model at starting point failed!\n"
237 "Starting values:\n{}\n\n"
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'a': array(-0.80057862), 'bB': array(0.31003719), 'bM': array(0.06270603), 'sigma_log__': array(-1.31484483), 'Bi_missing': array([ 1.42181326, 1.29176305, 0.89386098, 1.39275087, 0.14458384,
-0.39778831, 0.79556126, -0.09505109, 0.84795039, 0.27948074,
0.02310535, 0.13706759])}
Initial evaluation results:
a -1.51
bB -0.97
bM -0.23
sigma_log__ -1.58
Bi_missing 0.00
Bi -inf
K -342.50
Name: Log-probability of test_point, dtype: float64
Any ideas on what I can be doing wrong?