Computing Bayes Factor with Bernoulli Distribution

Hi, new to the community and bayesian modeling in general.
I am converting some example code from pymc v3 to pymc v5.

The original code comes from this notebook
The relevant section is this:

with pm.Model() as model:
    
    # Index to true model
    prior_model_prob = 0.5
    #tau = pm.DiscreteUniform('tau', lower=0, upper=1)
    tau = pm.Bernoulli('tau', prior_model_prob)
    
    # Poisson parameters
    mu_p = pm.Uniform('mu_p', 0, 60)

    # Negative Binomial parameters
    alpha = pm.Exponential('alpha', lam=0.2)
    mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)

    y_like = pm.DensityDist('y_like',
             lambda value: pm.math.switch(tau, 
                 pm.Poisson.dist(mu_p).logp(value),
                 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
             ),
             observed=messages['time_delay_seconds'].values)
    
    start = pm.find_MAP()
    step1 = pm.Metropolis([mu_p, alpha, mu_nb])
    step2 = pm.ElemwiseCategorical(vars=[tau], values=[0,1])
    trace = pm.sample(200000, step=[step1, step2], start=start)

_ = pm.traceplot(trace[burnin:], varnames=['tau'])

My attempt at converting it looks like this:

def dist(
    tau: TensorVariable,
    mu_p: TensorVariable,
    mu_nb: TensorVariable,
    alpha: TensorVariable,
    size: TensorVariable
) -> TensorVariable:

    poisson_ = pm.Poisson.dist(mu_p, size=size)
    negative_binomial_ = pm.NegativeBinomial.dist(mu_nb, alpha, size=size)
    
    return pm.math.switch(
        pm.math.eq(tau, 0),
        poisson_,
        negative_binomial_
    )

with pm.Model() as model:
    # Index to true model
    prior_model_prob = 0.5
    tau = pm.Bernoulli('tau', p=prior_model_prob)
    
    # Poisson parameters
    mu_p = pm.Uniform('mu_p', lower=0, upper=60)
    
    # Negative Binomial parameters
    alpha = pm.Exponential('alpha', lam=0.2)
    mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)
    
    y_like = pm.CustomDist(
        'y_like',
        tau,
        mu_p,
        mu_nb,
        alpha,
        logp=logp,
        observed=messages['time_delay_seconds'].values
    )

    start = pm.find_MAP()
    step1 = pm.Metropolis([mu_p, alpha, mu_nb])
    step2 = pm.BinaryGibbsMetropolis([tau])
    trace = pm.sample(
        draws=200000, 
        step=[step1, step2], 
        start=start,
        progressbar=False
    )

When comparing the mean of the posterior values of tau, I get 0 instead of a value between 0 and 1. What am I missing?

MAP estimates are generally a rough place to start a Metropolis-style algorithm, because they are hard to move away from because they maximize density. For example, in a standard normal, the mode is about the worst place to initialize. If you have the mode computed, you’re better off randomizing around the mode. But you have to be careful, because the modes don’t make sense in a lot of hierarchical/multilevel Bayesian models.

Can you include the code you’re using for comparison?

The post mentions Bayes factors. I’d suggest reading Gelman et al.'s BDA3 on Bayes factors and how they can be problematic (free pdf on the BDA3 home page). The general issue is that they evaluate prior predictive distributions rather than posterior predictive distributions, so they can be very very sensitive to priors. You may have better luck using full or approximate cross-validation. But I’d recommend posterior predictive checks on quantities of interest first.

1 Like

Hi Bob, thanks for the reply.

If you have the mode computed, you’re better off randomizing around the mode. But you have to be careful, because the modes don’t make sense in a lot of hierarchical/multilevel Bayesian models.

Good to know, I will keep this in mind when I start my own scripts

Can you include the code you’re using for comparison?

Yes, my code was included in the original post but here is the code again:

def dist(
    tau: TensorVariable,
    mu_p: TensorVariable,
    mu_nb: TensorVariable,
    alpha: TensorVariable,
    size: TensorVariable
) -> TensorVariable:

    poisson_ = pm.Poisson.dist(mu_p, size=size)
    negative_binomial_ = pm.NegativeBinomial.dist(mu_nb, alpha, size=size)
    
    return pm.math.switch(
        pm.math.eq(tau, 0),
        poisson_,
        negative_binomial_
    )

with pm.Model() as model:
    # Index to true model
    prior_model_prob = 0.5
    tau = pm.Bernoulli('tau', p=prior_model_prob)
    
    # Poisson parameters
    mu_p = pm.Uniform('mu_p', lower=0, upper=60)
    
    # Negative Binomial parameters
    alpha = pm.Exponential('alpha', lam=0.2)
    mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)

    y_like = pm.CustomDist(
        'y_like',
        tau,
        mu_p,
        mu_nb,
        alpha,
        logp=logp,
        observed=messages['time_delay_seconds'].values
    )

    start = pm.find_MAP()
    step1 = pm.Metropolis([mu_p, alpha, mu_nb])
    step2 = pm.BinaryGibbsMetropolis([tau])
    trace = pm.sample(
        draws=200000, 
        step=[step1, step2], 
        start=start,
        progressbar=False
    )

Here is what I was expecting in terms of visual output:

Here is what I am observing:

I’d suggest reading Gelman et al.'s BDA3 on Bayes factors and how they can be problematic (free pdf on the BDA3 home page)

Thanks for the link, I will add this to my reading list. I did read through this notebook, which also mentions the same problem. For now, I am interested in reproducing the example in pymc v5.

Where is your definition of logp?

Also consider the use of pm.Mixture here instead of a custom distribution

I tried this with pm.Mixture and also found that it got stuck at zero. The notebook you reference suggests messing with prior_model_prob but that didnt seem to change things unless I set it to 1.0! Another approach is to marginalize out the discrete variable and use this model:

import arviz as az
import pymc as pm
import pandas as pd

url = "https://raw.githubusercontent.com/markdregan/Bayesian-Modelling-in-Python/refs/heads/master/data/hangout_chat_data.csv"

messages = pd.read_csv(url)

with pm.Model() as model:
  
    #prior_model_prob = 1.0
    #tau = pm.Bernoulli('tau', p=prior_model_prob)
    p = pm.Beta('p', alpha=1, beta=1)
   
    mu_p = pm.Uniform('mu_p', lower=0, upper=60)

    alpha = pm.Exponential('alpha', lam=0.2)
    mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)

    
    poisson_dist = pm.Poisson.dist(mu=mu_p)
    nb_dist = pm.NegativeBinomial.dist(mu=mu_nb, alpha=alpha)

    likelihood = pm.Mixture(
        'likelihood',
        w=[p, 1 - p],
        comp_dists=[poisson_dist, nb_dist],
        observed=messages['time_delay_seconds'].values
    )

    idata = pm.sample()


az.plot_trace(idata)

The posterior on p I found has most of its density near 0.06 (94% HDI [.01 .11] ) . Note also the priors here could be improved as well, I did not investigate much further.

Hi DrEntropy, thanks for your response

Where is your definition of logp?

Here is the code, I just realized I shared my custom dist function instead earlier:

def logp(
    value: TensorVariable, 
    tau: TensorVariable,
    mu_p: TensorVariable,
    mu_nb: TensorVariable,
    alpha: TensorVariable
) -> TensorVariable:

    poisson_ = pm.Poisson.dist(mu_p)
    negative_binomial_ = pm.NegativeBinomial.dist(mu_nb, alpha)
    
    return pm.math.switch(
        pm.math.eq(tau, 0),
        pm.logp(rv=negative_binomial_, value=value),
        pm.logp(rv=poisson_, value=value)
    )

Also consider the use of pm.Mixture here instead of a custom distribution

Could you share more info on why pm.Mixture would be preferred instead of CustomDist?
I am still very new to pymc and would like to learn more.
Is a mixture model usually preferred for hierarchical modeling?

The notebook you reference suggests messing with prior_model_prob but that didnt seem to change things unless I set it to 1.0.

In that case, is it not expected behavior that negative binomial model was always selected over the poisson model in pymc v5? I also tried hardcoding random parameter values for both models and noticed the same behavior.

The posterior on p I found has most of its density near 0.06 (94% HDI [.01 .11] ) .

Thank you for the code for the mixture model, really helpful!
In regards to the variable p, can I use it to calculate the Baye’s factor like so?

p(poisson_model|data) = 0.06. 
p(negative_binomial_model|data) = 0.94

p(negative_binomial_model|data)/p(poisson_model|data) * p(poisson_model)/p(negative_binomial_model) = 15.666

Finally, I am puzzled why the mu for the Poisson model is ~45 with the code you shared?
Previously, following the notebook instructions, I created the Poisson model on the data separately and found the value to be around 18.2 instead.

with pm.Model() as model:
    mu = pm.Uniform('mu', lower=0, upper=60)
    likelihood = pm.Poisson(
        'likelihood', 
        mu=mu, 
        observed=messages['time_delay_seconds'].values
   )
    
    start = pm.find_MAP()  # finding the area of highest likelihood
    step = pm.Metropolis()
    
    trace = pm.sample(draws=200000, start=start, step=step, progressbar=True)

Upon further reflection, the mixture is not the same as what you are trying to do, it assumes each point could come from either distribution. What you want is a model where all the points come from one or the other (we don’t know which).

I don’t know off hand how to marginalize in that case!

As for the Bayes factor business, I agree with @bob-carpenter and one should tread carefully. See also Bayes Factors and Marginal Likelihood — PyMC example gallery. I happened to be at lecture 9 as I work through the BDA3 course, and this topic is discussed there as well, very illuminating!

BTW instead of Bayes factor for model comparison, consider instead things like epld_loo with LOO-CV. (see Model comparison — PyMC 5.23.0 documentation, for example, as well as [2011.01808] Bayesian Workflow) . But before you even get that far, you can see immediately that there is a problem with the posterior predictions for the poisson model:

with pm.Model() as model:
    mu_p = pm.HalfNormal('mu_p',sigma = 10)
    _ = pm.Poisson('observed', mu=mu_p, 
         observed=messages['time_delay_seconds'].values)
    trace = pm.sample( )

with model:
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)

az.plot_ppc(trace)

Compare this to the same kind of plot for negative binomial:

So you probably would not consider further the Poisson model. But if you want to make this more concrete you could use az.compare for example as discussed in the link above.

If you really are mixing the whole population and there’s just a single population, then you don’t have enough information to fit a whole-population mixture if the mixture component parameters and mixture rate is unknown.

1 Like

instead of Bayes factor for model comparison, consider instead things like epld_loo with LOO-CV.

Good to know, I will check it out later

But before you even get that far, you can see immediately that there is a problem with the posterior predictions for the poisson model

Agreed. The goal of the exercise in this notebook was to demonstrate the Poisson model is the less likely model for the sake of completeness. They picked the Poisson model to make the difference more apparent, which should have been reflected in the computation of the Bayes factor

Ah, I see! It is like trying to estimate the probability of a coin flipping heads from just observing one data point! And asking to estimate other properties of the unseen side !

Before reading your post I put together what I think was the correct marginalized version of what was in the notebook the OP posted and the posterior for the ‘p’ (give uniform on the prior) looked suspiciously like a Beta(2,1) , which now makes perfect sense. (EDIT and the recovered posterior for tau also is 100% neg binomial)

As far as I can determine, the results you are getting (100% negative binomial) is consistent with what I am getting from the model you provided, and it makes sense because the data is so clearly more consistent with negative binomial. I also cannot reproduce the results in the original notebook.

If you want to continue down this path, the suggestions in Bayes Factors and Marginal Likelihood — PyMC example gallery might be helpful, it does discuss briefly this approach.