Hi,
Now I want to convert the following code using default stream into two stream’s version for concurrency.
- H2D MemcpyAsync
- kernel
- D2H MemcpyAsync
- kernel
- kernel
Now the data dependency among the above 5 statements is following
1
|
2
|
| |
3 4
|
5
In a word, 3 and 4 can execute concurrently but both should be ordered after 2
For concurrency, 3 and 4 should execute in different streams each other.
So, at least, 3 or 4 should run in different stream with the stream where 1 and 2.
But putting 3 or 4 into different stream breaks the ordering, which leads to different result: 3 or 4 can run concurrently with 1 or 2.
So I wonder there is a way of ordering 3 or 4 after 2 though it executes in different stream with the one where 1 and 2 execute.