Result of device-to-device cudaMemcpyAsync with stream synchronized

Hi,

Suppose I have

cudaMemcpyAsync(dev2, dev1, N, cudaMemcpyDeviceToDevice, stream1)
cudaStreamSynchronize(stream1)

where dev2 is a pointer on device 2, dev1 is a pointer on device 1 and stream1 is a stream on device 1.

After the cudaStreamSynchronize(), does it guarantee that

  1. the data has been copied to dev2, i.e., the whole copy has finished
    Or it simply guarantees that
  2. data has been copied from dev1 and dev1 can be reused, and the data is not necessarily in dev2
    ?

Thanks.

It guarantees that all previous operations issued to stream1 are complete. i.e. the whole copy has finished.

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html#group__CUDART__STREAM_1g82b5784f674c17c6df64affe618bf45e

Waits for stream tasks to complete.

...

Blocks until stream has completed all operations.