-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
NUTS Sampler Fails with pygpu.gpuarray.GpuArrayException Error #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We usually dont see a big advantage in using GPU in our use cases, so my suggestion is to set theano to CPU only and try again. |
I tried that by changing the setting in my .theanorc.txt file from 'device = cuda' to 'device = cpu'. This resulted in a new error: Auto-assigning NUTS sampler... Needless to say, I did not initiate any control-C event, so I am just as puzzled with this new error as with the last one, unless there is another way to set theano to use the CPU only. |
Did you install mkl-services in anaconda? |
I don't believe so, unless it was part of the baseline anaconda installation. Unfortunately, I am traveling over the next week and don't have access to my desktop to test this out. Frankly, I have been having enough problems getting a stable PyMC3 installation that I am considering starting over by reinstalling anaconda, then reinstalling PyMC3 from conda while taking careful notes as I go along to document my steps. However, please see my related Issue 3093 for problems encountered for PyMC3 installation on a different Windows 10 laptop. |
Now that I am back at my home office, I have returned to the problem of installing PyMC3 in Windows 10 in conjunction with Python 3.7/Anaconda 5.2 on my dual-boot GPU-enabled desktop. I have tried a couple of ways of doing this, one consistent with the procedure at http://datahans.blogspot.com/2016/04/installing-pymc3.html (but using Python 3.7, and not 2.7 as recommended in the link), and the other by creating a special conda package in line with one of the suggestions at #2988. In the second case, the yml file was:
In both cases, I got through the original sampling and MAP estimate portions of the regression case code. And in both cases, I then received this traceback:
For what it is worth, I have run the sample code at http://deeplearning.net/software/theano/tutorial/using_gpu.html confirming that the GPU can be run successfully on my PC. So I am trying other ideas, but as before, any ideas would be appreciated. |
I have continued to try installing PyMC3 on my desktop, and have successfully achieved it by modifying my yml file as follows:
This installation using Python 2.7 appears to run successfully using both my CPU and GPU, although I do get an odd "Could not pickle model, sampling singlethreaded." message when I run on the GPU. So in a sense, this installation issue appears to be resolved, since I can now use the PyMC3 app for my work. Now that I've said that, it's worth pointing out that I have never, on any machine, been able to get PyMC3 to successfully install to a Windows 10 platform using the current version of Python 3. |
Glad you got something working. It is worrying that you can't get Py3 to work, however. To clarify, are you talking about working with a GPU, or working at all? Does it work in a CPU environment? |
The traceback I posted a couple of days ago was with device = cuda. Changing this to device = cpu in my .theanorc.txt file results in:
|
I think I know why. Py3 properly supports parallel sampling, which would only work if you had 4 GPUs. So try setting |
Awesome! That did it. The sample case runs in the Python 3 PyMC3 environment with both device = cuda and device = cpu in .theanorc.txt. I note that the CPU run takes about a quarter the time that the GPU run takes, so it would be hard to justify doing this using my GPU. My only other question: Why did the Python 2 installation work properly? Beyond that, it would be useful to include more bulletproof instructions for installing PyMC3 under Windows. Other than that, I think my PyMC3 installation looks good. Thanks so much for the assist. |
Glad it's working. Yes, GPU only speeds up very few models. There's probably more optimization that could be done, however, but it's not a priority currently. The support for parallel sampling is a bit broken in python 2 so I assume that you just didn't get parallelization there and thus theano didn't try to run 4 GPU jobs in parallel. Happy sampling! |
hi, I also met the same problem. I still do not understand how to solve this problem as this is the first time I run PyMC3? Can you write it down more detail? thanks |
@JIXING123 Did you try sampling with |
Hi, thank you for your reply. I do not set jobs=1 as I do not know where should I put this code. For example, trace=pm.sample(2000, jobs=1) or set pm.sample(jobs=1)as an independent line? Attachment is the code from http://people.duke.edu/~ccc14/sta-663-2016/16C_PyMC3.html, which is just used for learning PyMC3. I also try the example which MisterRedacts did. They are the same error. import pymc3 as pm n=100 xs= np.linspace(0,1,100) #introduction PyMC3 |
Try the attached procedure, environment config and test PyMC3 Python files, which worked well on my Windows PC: PyMC3 Windows Installation Instructions.docx Make sure you change pymc3_env_3_7.txt to pymc3_env_3_7.yml and Python3_Regression_Case.txt to Python3_Regression_Case.py before you start. Good luck. |
Wow thank you so much for writing down your experience! |
@JIXING123 In case you are using a single GPU: did you try setting Looks like below commit replaced keyword |
Hey, I'm also having troubles with this and none of the suggested solutions work. I've changed the device to be 'cpu' already as well. `model=pymc3.Model()' 'with model:'
I had been trying to draw the covariance matrix for the parameters from a distribution as well, but was having a lot of trouble getting that to work and wanted to just make sure I could get something simpler to work first. X and Y are time series data matrices. X includes the lags of Y (and ones). |
pretty sure you will run out of memory with |
I'm using identity matrices because I was being bombarded with errors when I was drawing from another distribution. Also is the cov for the observations not meant to be the covariance matrix for the error terms? |
If you have error also in CPU, this is likely a different issue and you should open a new issue or discussion on https://discourse.pymc.io. Did you do a search on our discourse? I remember using sparse matrix is not trivial and there are a few discussion there. |
I was able to solve this issue in my environment (a single GPU laptop) by adding the parameter "cores=1" to my trace call. So for example: |
I am unable to run the NUTS sampler in PyMC3. The case I am running is a straightforward transcription of the basis Normal distribution case in the regression example at http://docs.pymc.io/notebooks/getting_started#Installation:
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
#print('Running on PyMC3 v{}'.format(pm.version))
plt.style.use('seaborn-darkgrid')
Initialize random number generator
np.random.seed(123)
True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]
Size of dataset
size = 100
Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
fig, axes = plt.subplots(1, 2, sharex=True, figsize=(10,4))
axes[0].scatter(X1, Y)
axes[1].scatter(X2, Y)
axes[0].set_ylabel('Y'); axes[0].set_xlabel('X1'); axes[1].set_xlabel('X2');
plt.show()
basic_model = pm.Model()
with basic_model:
map_estimate = pm.find_MAP(model=basic_model, method='powell')
print(map_estimate)
with basic_model:
# draw 500 posterior samples
trace = pm.sample(500)
This code appears to work properly up to the point where the pm.sample line is encountered. At that point I receive a pygpu.gpuarray.GpuArrayException invalid value error. The following is the complete response from the run including traceback:
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using cuDNN version 5105 on context None
Mapped name None to device cuda: GeForce GTX 950 (0000:03:00.0)
0%| | 0/5000 [00:00<?, ?it/s]D
:\Programs\Anaconda3\Lib\site-packages\scipy\optimize_minimize.py:502: RuntimeWarning: Method powell does not use gradi
ent information (jac).
RuntimeWarning)
logp = -148.98, ||grad|| = 0.73744: 100%|███████████████████████████████████████████| 183/183 [00:00<00:00, 185.38it/s]
{'alpha': array(0.9090931, dtype=float32), 'beta': array([0.9514547, 2.6145666], dtype=float32), 'sigma_log__': array(-0
.03494539, dtype=float32), 'sigma': array(0.9656581, dtype=float32)}
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma_log__, beta, alpha]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\queues.py", line 151, in _feed
obj, reducers=reducers)
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\reduction.py", line 145, in dumps
p.dump(obj)
File "D:\Programs\Anaconda3\Lib\site-packages\theano\gpuarray\type.py", line 909, in GpuArray_pickler
return (GpuArray_unpickler, (np.asarray(cnda), ctx_name))
File "D:\Programs\Anaconda3\Lib\site-packages\numpy\core\numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
File "pygpu\gpuarray.pyx", line 1735, in pygpu.gpuarray.GpuArray.array
File "pygpu\gpuarray.pyx", line 1405, in pygpu.gpuarray._pygpu_as_ndarray
File "pygpu\gpuarray.pyx", line 394, in pygpu.gpuarray.array_read
pygpu.gpuarray.GpuArrayException: b'cuMemcpyDtoHAsync(dst, src->ptr + srcoff, sz, ctx->mem_s): CUDA_ERROR_INVALID_VALUE:
invalid argument'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".\Regression_Case.py", line 50, in
trace = pm.sample(500)
File "D:\Programs\Anaconda3\Lib\site-packages\pymc3\sampling.py", line 442, in sample
trace = _mp_sample(**sample_args)
File "D:\Programs\Anaconda3\Lib\site-packages\pymc3\sampling.py", line 982, in _mp_sample
traces = Parallel(n_jobs=cores, mmap_mode=None)(jobs)
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\parallel.py", line 962, in call
self.retrieve()
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\parallel.py", line 865, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "D:\Programs\Anaconda3\Lib\site-packages\joblib_parallel_backends.py", line 515, in wrap_future_result
return future.result(timeout=timeout)
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky_base.py", line 431, in result
return self.__get_result()
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky_base.py", line 382, in __get_result
raise self._exception
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\queues.py", line 151, in _feed
obj, reducers=reducers)
File "D:\Programs\Anaconda3\Lib\site-packages\joblib\externals\loky\backend\reduction.py", line 145, in dumps
p.dump(obj)
File "D:\Programs\Anaconda3\Lib\site-packages\theano\gpuarray\type.py", line 909, in GpuArray_pickler
return (GpuArray_unpickler, (np.asarray(cnda), ctx_name))
File "D:\Programs\Anaconda3\Lib\site-packages\numpy\core\numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
File "pygpu\gpuarray.pyx", line 1735, in pygpu.gpuarray.GpuArray.array
File "pygpu\gpuarray.pyx", line 1405, in pygpu.gpuarray._pygpu_as_ndarray
File "pygpu\gpuarray.pyx", line 394, in pygpu.gpuarray.array_read
pygpu.gpuarray.GpuArrayException: b'cuMemcpyDtoHAsync(dst, src->ptr + srcoff, sz, ctx->mem_s): CUDA_ERROR_INVALID_VALUE:
invalid argument'
The following are my versions and main components
I was hoping to use PyMC3 in an upcoming project, so any assistance you might provide would be much appreciated.
The text was updated successfully, but these errors were encountered: