Design for bound, truncated and censored distributions

There have been a couple of issues about these, and I thought it might help to collect them and try to find a good design for truncated and censored distributions.

#1843 #1847 #1833 #1861 #1860

This got a bit longer and less concrete than I wanted, but here it is:

Just to get everyone on the same page about the status quo:
- "Bound" distributions: These are implemented as `pm.Bound`, which maps a distribution to a different distribution that can't take values outside some interval, but has the same logp in this interval. They aren't strictly speaking probability distributions anymore (the norm is less than 1).
- Truncated distributions: Same as "Bound", but with a correction to the logp to make the norm 1. We do not have a build-in method of defining those (it's possible with a potential or a custom distribution).
- Censored distributions: Values in a subset of the support are reported, values outside are only counted. We might want to support right/left/interval censoring. There is no current direct support for it.

## cdf / ccdf

We need `lcdf` and/or `lccdf` functions for our distributions for both truncation and censoring. #1861 is a start for that. I would propose to add stubs to `distribution.Distribution` that raise a `NotImplementedError` and then overload those in the actual distributions. We might also want to add something like `linterval` and maybe `lcinterval` – sometimes there are more stable ways to compute the log of the difference of cdfs.

## Bound distributions
I'm actually not sure that we should keep those. It seems to me that they could be useful for defining new distributions, but in those cases an interval transform seems more straight forward. There are cases where we can use a bound distribution instead of a truncated one and get the same result, (if the difference of the lpdfs does not depend on any of the parameters), and we might save a bit of time in those cases. But I'm not sure this is worth the additional confusion (see #1833 for an example where it went wrong; and I did something like that in the past myself). To prevent the loss of performance, we could add a parameter `propto` to `Distribution.logp`, and drop terms that do not depend on the parameters if it is set to `True` to get the same effect. (stan does this, and it might save a bit of time in a lot of other circumstances, too).
Are there other use-cases that I'm not aware of? If not, I think deprecating them in favour of truncated distributions and removing them in a later release would be fine. Instead, we could also only deprecate the logp function of `Bound`, I think that would prevent most misusage. A third option would be to change their behaviour to the truncated distribution.

## Truncated distributions
Implementing those seems rather strait forward once we have the lcdfs: Do the same as in Bound, but correct the logp function with the difference of the cdfs at the boundaries and fix the random function. We could follow the same design as `Bound`, but maybe we could use the opportunity to make that a bit more ergonomic. Bound seems a bit confusing to new users right now. Maybe it would be better to predefine distributions `TruncatedNormal` et al or put them in a submodule `truncated.Normal`. They could then have additional parameters `lower_trunc` and `upper_trunc`. This would also make it much easier to test. I doubt all conceivably useful combinations involving `Bound` are working at the moment (eg #1847)

## Censored distributions
They are tricky. Supporting a couple of simple use cases seems easy enough: Say we analyse a survival experiment involving mice, where the experiment ends after 6 months. We could model that with something like that:
```python
Survival = pm.Censored(pm.Weibull, upper=6)
Survival('months', ..., observed=data)
```
and code censored events by setting them to 6 exactly (which is a null set in the original distribution), or add an additional parameter `is_censored`. 

Complications:

- We need to be careful to not break posterior sampling with something like that.
- Censoring can be much more complicated if we allow the censoring events themselves to be variables. Say we want to model a cancer treatment, where no patients drop out, but from time to time a patient is hit by a stray meteorite – not related to the treatment at all (and sorry for the example, that's what happens if you spend time with biologists and astronomers). Then the time to a meteorite hit – a censoring event – follows an exponential distribution. But since some patients die before we observe that, we end up with a censored distribution for the meteor hits *and* the normal deaths. And in real experiments the censoring events might not even be independent from the actual event of interest.
- There are different forms of censoring: right / left / interval and combinations thereof.
- Censored distributions are not absolutely continuous. Maybe it helps to look at them as a mixture of a truncated distribution and a discrete distribution.

## tldr
- Deprecate `Bound`
- add submodule `distributions.tuncated` containing truncated (and tested) versions of most distributions. Each with two additional parameters `lower_trunc` and `upper_trunc`.
- Implement lcdf, lccdf, linterval, lcinterval for most distributions
- Add `propto` argument to logp of distribution and drop summands that do not depend on the parameters.
- Think about a good design for censored

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Design for bound, truncated and censored distributions #1864

cdf / ccdf

Bound distributions

Truncated distributions

Censored distributions

tldr

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Design for bound, truncated and censored distributions #1864

Description

cdf / ccdf

Bound distributions

Truncated distributions

Censored distributions

tldr

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions