|
| 1 | +# Architecture |
| 2 | +This document describes the high-level architecture of PyMC3. |
| 3 | + |
| 4 | +# Bird's Eye View |
| 5 | +[comment]: <> (https://drive.google.com/file/d/1lfEzokkNUJr_JIeSDQfha5a57pokz0qI) |
| 6 | + |
| 7 | +Lets you define probabilistic graphs or models that can be easily used to compute log probabilities for posterior |
| 8 | +inference or to draw random samples for prior and posterior prediction. |
| 9 | + |
| 10 | +PyMC3 includes a few inference techniques, in particular: |
| 11 | +* Markov chain Monte Carlo |
| 12 | +* Variational Inference |
| 13 | +* Sequential Monte Carlo |
| 14 | + |
| 15 | +It also contains numerous others pieces of functionality such as GraphviZ model visualization tools |
| 16 | +as well as various mathematical helper functions. |
| 17 | + |
| 18 | +The most central pieces functionality of PyMC3 are shown visually below, as well as their |
| 19 | +relation to other major packages. Not all modules are shown, either because |
| 20 | +they are smaller or self explanatory in scope, or they're pending |
| 21 | +deprecation |
| 22 | + |
| 23 | +## Functionality not in PyMC3 |
| 24 | +It is easier to start with functionality that is not present in PyMC3 but |
| 25 | +rather deferred to outside libraries. If seeking to understand any |
| 26 | +of the topics below refer to that specific library |
| 27 | + |
| 28 | +### Aesara |
| 29 | +* Gradient computation |
| 30 | +* Random number generation |
| 31 | +* Low level tensor operation definition |
| 32 | +* Low level operation graphs |
| 33 | + |
| 34 | +### ArviZ |
| 35 | +* Plotting e.g. Trace plots, rank plots, posterior plots |
| 36 | +* MCMC sampling diagnostics e.g. Rhat, Effective Sample Size. |
| 37 | +* Model comparison, particularly efficient leave-one-out cross-validation approximation |
| 38 | +* Inference Data structure |
| 39 | + |
| 40 | + |
| 41 | +# Modules |
| 42 | +The codebase of PyMC3 is split among single Python file modules at the root |
| 43 | +level, as well as directories with Python code for logical groups of functionality. |
| 44 | +Admittedly the split between single `.py` module or directory is not defined by a strict |
| 45 | +criteria but tends to occur when single `.py` files would be "too big". |
| 46 | +We will with the modules needed implement "simple MCMC" model shown below |
| 47 | +before detailing the remaining modules, such as Variational Inference, Ordinary Differential Equations, |
| 48 | +or Sequential Monte Carlo. |
| 49 | + |
| 50 | +```python |
| 51 | +with pm.Model() as model: |
| 52 | + theta = pm.Beta("theta", alpha=1, beta=2) |
| 53 | + p = pm.Beta("n", p=theta, n=2, observed=[1,2]) |
| 54 | + inf_data = pm.sample() |
| 55 | + |
| 56 | + |
| 57 | +``` |
| 58 | + |
| 59 | +## {mod}`pymc3.model` |
| 60 | +Contains primitives related model definition and methods used for evaluation of the model. |
| 61 | +In no particular order they are |
| 62 | + |
| 63 | +* `ContextMeta`: The context manager that enables the `with pm.Model() as model` syntax |
| 64 | +* {class}`~pymc3.Factor`: Defines the methods for the various logprobs for models |
| 65 | +* `ValueGrad` which handles the value and gradient and is the main connection point to Aesara |
| 66 | +* `Deterministic` and `Potential`: Definitions for two pieces of functionality useful in some model definitions |
| 67 | + |
| 68 | +## distributions/ |
| 69 | +Contains multiple submodules that define distributions, as well as logic that aids in distributions usage. |
| 70 | +Important modules to note are |
| 71 | + |
| 72 | +* `distribution.py`: This contains parent class for all PyMC3 distributions. |
| 73 | + Notably the `distribution.distribution` class contains the `observed` argument which in PyMC3 differentiates |
| 74 | + a random variable distribution from a likelihood distribution. |
| 75 | + |
| 76 | +* `logprob.py`: This contains the log probability logic for the distributions themselves. |
| 77 | + The log probability calculation is deferred to Aesara |
| 78 | + |
| 79 | +* `dist_math.py`: Various convenience operators for distributions. |
| 80 | + This includes mathematical operators such as `logpower` or `all_true`methods. |
| 81 | + It also contains a suite of lognormal methods and transformation methods |
| 82 | + |
| 83 | +## /sampling.py |
| 84 | +Interface to posterior, prior predictive, and posterior sampling as well as various methods to identify and initialize |
| 85 | +stepper methods. Also contains logic to check for "all continuous" variables and initialize NUTS |
| 86 | + |
| 87 | +## step_methods/ |
| 88 | +Contains various step methods for various sampling algorithms, such as MCMC, and SMC. `step_methods.hmc` includes |
| 89 | +the Hamiltonian Monte Carlo sampling methods as well as helper functions such as the integrators used for those methods |
| 90 | + |
| 91 | +## tests/ |
| 92 | +All tests for testing functionality of codebase. All modules prefixed with `test_` are tests themselves, whereas all |
| 93 | +other modules contain various supporting code such as fixtures, configurations, etc |
0 commit comments