cuda MemCopy memory consistency issue (across streams)

Hi all,

I’d like to ask about the memory consistency issue across multiple streams.

Suppose I’m using 2 streams.
In stream1, multiple memory copy operations (cuMemcpyHtoD: from host to device) are being launched.
They will be executed in the order they are launched because they all are in a single stream.

In this situation,
does stream2 always see the exact same order of memory copy operations as the stream1 sees (launches?)
Or is it possible that the order of memcpy operation results seen by the stream2 different from the order stream1 launches?
In other words, can I assume sequential consistency here across streams?
Any comments would be appreciated.
Thanks.

HyoukJoong.