Parallel (ensemble) sampling methods in PyMC

I’m curious about the state of parallel or ensemble sampling algorithms in PyMC, such as https://proceedings.mlr.press/v151/hoffman22a/hoffman22a.pdf , which has a tuning procedure that uses a whole batch of chains.

Part of the reason I ask is that my research group is working on similar methods, and we’re curious how much users of PyMC want and/or have access to methods which run many parallel chains, since for many problems it can lead to very large wall-clock time speed ups (e.g. if burn-in time is small compared to the length of the chain).