According to the docs the driver and runtime APIs are mutually exclusive and only one of the two should be used. On the other hand the 0.8.1 SDK includes bandwidthTest.cu in which calls from both APIs are happily mixed, for instance cuMemAllocSystem (driver) and cudaMalloc (runtime). So what is really the deal here? Why should we avoid using a mixture of both?
cuMemAllocSystem() and cuMemFreeSystem() are the only CUDA driver calls that can be used with the runtime API (as specified at the bottom of Section 4.5.3.6).
This is because they don’t deal with compute device matters per se, so there can’t be any bad interference with the runtime.
Is the mutual exclusion still true (in 1.1)? If the runtime API is built on top of the driver API how can they be mutually exclusive? It should be possible to create an array in one and access it from the other (vice versa), no? How does one do this?