cudaMemcpyAsync Device to Host : Need to synchronize before using data on host

freda82 · July 19, 2017, 11:58am

Hi,

Lets say I have the following

for( i = 0; i < N ; i++ )
{
checkCudaErrors(cudaMemcpyAsync(host[i], device[i], size_bytes,cudaMemcpyDeviceToHost, stream[i] )
}

// using host data after this point

Do I need to synchronize ( using cuda events or device synchronize or stream synchronize etc) to make sure the host data is available before using it ?

Robert_Crovella · July 19, 2017, 2:31pm

yes

freda82 · July 19, 2017, 2:40pm

Thanks txbob

vandev · October 7, 2022, 10:00pm

@Robert_Crovella , I read contracting information on CUDA documentations.

This page explains some special cases that cudaMemcpyAysnc is done synchronously internally, e.g. if the host memory is pageable memory then the copy is synchronous (no need to sync after copy to have valid data on host):
https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior__memcpy-async

So for the code in the question above, the copy must have been synchronous and no stream sync should be needed.

Either the document I have attached is outdated or the answer to this question is “synchronization is redundant, the copy call is already sync.”. Which one is correct?

Robert_Crovella · October 7, 2022, 10:06pm

Why is that?

Are you suggesting that you know that the host target is pageable memory for the posted code?

vandev · October 7, 2022, 10:12pm

Yes, I made assumption that “host” is pageable.

So correct answer indeed depends on “host” pointer, right? (e.g. If “host” is allocated as pageable memory then synchronization is not needed (already synced), but if it is allocated as pinned memory then synchronization is needed.)

Robert_Crovella · October 7, 2022, 10:15pm

Yes, a better answer would be to condition my answer with “assuming the host allocations are pinned”

This is 5 years old, so I don’t have a memory of this. However, if I had to guess, perhaps the reason I answered the way I did is:

my answer is “safe” from a correctness point of view, whether the host allocations are pinned or pageable (your suggestion is not safe, if the allocations are actually pinned)
Usually people who are using cudaMemcpyAsync know or should know that the async character is lost when the host allocation is not pinned. This code appears to be carefully crafted that way - using streams for example. So I probably assumed something I shouldn’t.

Providing all this description makes it a better answer. My answer was sloppy.

I will say that I find the usage of cudaMemcpyAsync in a “non-typical” way is problematic for a few reasons. When I am teaching CUDA I encourage people to follow the recommended patterns.

Many folks who are learning CUDA forget that the allocations must be pinned for async behavior.
It makes for confusing code if you intentionally use cudaMemcpyAsync and intentionally use pageable allocations and intentionally depend on the conversion to sync behavior. I like code that is easily understandable. Code that requires me to check a bunch of conditions to determine its behavior is troublesome, in my view. Therefore I would encourage folks who intend to use blocking behavior here to use cudaMemcpy, not cudaMemcpyAsync, because the intent is then clear.

But to each his own. It’s not wrong, either way, as long as the underlying behavioral rules are understood.

vandev · October 7, 2022, 10:18pm

Thanks for clarifications and quick response.

Topic		Replies	Views
Confusion about synchronization or asynchronization of cudaMemcpy() and cudaMemcpyAsync() CUDA Programming and Performance	5	3967	December 23, 2023
cudaMemcpyAsync problem CUDA Programming and Performance	9	3145	May 26, 2020
Synchronization of cudaMemcpyAsync for pageable memory CUDA Programming and Performance	2	1702	October 3, 2021
CUDA streams questions CUDA Programming and Performance	1	1016	May 17, 2015
Problem with asynchronous host to host memcpy CUDA Programming and Performance	1	4714	January 4, 2011
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	572	October 19, 2023
Are cudaMemCpy and cudaMalloc blocking/synchronous? CUDA Programming and Performance	1	517	September 30, 2024
Does cudaMemcpyAsync require host memory to be pinned? CUDA Programming and Performance cuda	1	405	October 6, 2022
How does the cudaMemcpyAsync work with not page-locked memory? CUDA Programming and Performance	4	467	August 28, 2023
cudaMemcpyAsync clarification required & help needed CUDA Programming and Performance	0	1753	October 17, 2009

cudaMemcpyAsync Device to Host : Need to synchronize before using data on host

Related topics