-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Rewrite core of PyMC3 (upcoming v4) to rely on aesara (among many other improvements) #4500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In case this helps:
with pm.Model() as m:
x = pm.Normal('x', 0, 1)
prior = pm.sample_prior_predictive(10)
print(prior)
{'x': array([0.019066, 0.019066, 0.019066, 0.019066, 0.019066, 0.019066,
0.019066, 0.019066, 0.019066, 0.019066])}
|
Is there an example of this, or a test that's failing? |
The random tests are still disabled, so this wouldn't show up right? I wanted to check with the prior predictive sampling but right now that's also unusable. There was an issue in the logp of the HalfCauchy that lead to the addition of an unused parameter (not sure if related): |
Managed to do it with import pymc3 as pm
import scipy.stats as st
pymc_samples = pm.HalfCauchy.dist(beta=.01, size=10_000).eval()
print(pymc_samples.mean(), pymc_samples.std())
# (6.647935041418325, 81.64505597965584)
# Shoud match this
scipy_samples = st.halfcauchy(loc=0, scale=.01).rvs(10_000)
print(scipy_samples.mean(), scipy_samples.std())
# (0.049328030173446065, 0.3650001369075893)
# But matches this
scipy_samples = st.halfcauchy(loc=0.01, scale=1).rvs(10_000)
print(scipy_samples.mean(), scipy_samples.std())
# (5.537612320455185, 73.96359661595363)
# V3
pymc_samples = pm.HalfCauchy.dist(beta=.01, shape=10_000).random()
print(pymc_samples.mean(), pymc_samples.std())
# (0.05829908302484347, 1.122931711761449) The HalfCauchy is notoriously hard to evaluate, but with these parameters it seems reliable. |
The tests in
It's not unusable; you only need to set/change the seed between calls to |
Thanks Brandon. I didn't mean it's unusable. I just didn't know how to make use of it :b I confirmed my suspicion about the Half Cauchy though. There is a mismatch between the old logp and the new randomOp, as the later expects two arguments, while the former expects only one. That's why the logp test would fail if the unused It's not too difficult to add this loc parameter to the logp, but just wanted to make sure we want to do that. Do we want to adapt to the number of arguments expected in aesara or just customize our own random ops in terms of the arguments we want to use? |
If an adaptation is something as simple as inverting a parameter, then we'll always want to do that, but this case sounds like we'll need to do the latter (i.e. customize a |
On another front: import pymc3 as pm
with pm.Model() as m:
x = pm.Uniform('x', 0, 1)
y = pm.Uniform('y', 0, x)
m.logp({'x_interval': 0, 'y_interval': 0})
# array(-2.83990034)
m.logp({'x_interval': 0, 'y_interval': 0})
# array(-2.09721045)
m.logp({'x_interval': 0, 'y_interval': 0})
# array(-2.23865389) Also it ignores when I set transform=None, and creates the interval variables anyway. |
@ricardoV94 were the trailing On >>> m.logp({'x_interval': 0, 'y_interval': 0})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\osthege\Repos\pymc3-dev\pymc3\model.py", line 1565, in __call__
return self.f(**point)
File "C:\Users\osthege\AppData\Local\Continuum\miniconda3\envs\pm3-dev\lib\site-packages\aesara\compile\function\types.py", line 956, in __call__
raise TypeError(
TypeError: Missing required input: x_interval__ ~ TransformedDistribution
>>> m.logp({'x_interval__': 0, 'y_interval__': 0})
array(-2.77258872)
>>> m.logp({'x_interval__': 0, 'y_interval__': 0})
array(-2.77258872)
>>> m.logp({'x_interval__': 0, 'y_interval__': 0})
array(-2.77258872) |
Yes they were removed for the time being. |
I'll add fixes for those in a minute. |
e6d7ae8
to
af23c2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these comments a while back, but forgot to hit submit..
This value was not representative of its name.
…d basic dists These changes can be summarized as follows: - `Model` objects now track fully functional Theano graphs that represent all relationships between random and "deterministic" variables. These graphs are called these "sample-space" graphs. `Model.unobserved_RVs`, `Model.basic_RVs`, `Model.free_RVs`, and `Model.observed_RVs` contain these graphs (i.e. `TensorVariable`s), which are generated by `RandomVariable` `Op`s. - For each random variable, there is now a corresponding "measure-space" variable (i.e. a `TensorVariable` that corresponds to said variable in a log-likelihood graph). These variables are available as `rv_var.tag.value_var`, for each random variable `rv_var`, or via `Model.vars`. - Log-likelihood (i.e. measure-space) graphs are now created for individual random variables by way of the generic functions `logpt`, `logcdf`, `logp_nojac`, and `logpt_sum` in `pymc3.distributions`. - Numerous uses of concrete shape information stemming from `Model` objects (e.g. `Model.size`) have been removed/refactored. - Use of `FreeRV`, `ObservedRV`, `MultiObservedRV`, and `TransformedRV` has been deprecated. The information previously stored in these classes is now tracked using `TensorVariable.tag`, and log-likelihoods are generated using the aforementioned `log*` generic functions.
This commit changes `DictToArrayBijection` so that it returns a `RaveledVars` datatype that contains the original raveled and concatenated vector along with the information needed to revert it back to dictionay/variables form. Simply put, the variables-to-single-vector mapping steps have been pushed away from the model object and its symbolic terms and closer to the (sampling) processes that produce and work with `ndarray` values for said terms. In doing so, we can operate under fewer unnecessarily strong assumptions (e.g. that the shapes of each term are static and equal to the initial test points), and let the sampling processes that require vector-only steps deal with any changes in the mappings.
The approach currently being used is rather inefficient. Instead, we should change the `size` parameters for `RandomVariable` terms in the sample-space graph(s) so that they match arrays of the inputs in the trace and the desired number of output samples. This would allow the compiled graph to vectorize operations (when it can) and sample variables more efficiently in large batches.
Classes and functions removed: - PyMC3Variable - ObservedRV - FreeRV - MultiObservedRV - TransformedRV - ArrayOrdering - VarMap - DataMap - _DrawValuesContext - _DrawValuesContextBlocker - is_fast_drawable - _compile_theano_function - vectorize_theano_function - get_vectorize_signature - _draw_value - draw_values - generate_samples - fast_sample_posterior_predictive Modules removed: - pymc3.distributions.posterior_predictive - pymc3.tests.test_random
* Refactor Flat and HalfFlat distributions * Re-enable Gumbel logp test * Remove redundant test
Co-authored-by: Thomas Wiecki <[email protected]>
Co-authored-by: Ricardo <[email protected]> Co-authored-by: Thomas Wiecki <[email protected]>
Co-authored-by: Thomas Wiecki <[email protected]> Co-authored-by: Ricardo <[email protected]>
Co-authored-by: Michael Osthege <[email protected]>
Co-authored-by: Oriol Abril-Pla <[email protected]>
No description provided.