Open
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.0.5 shipped with Nvidia hpc_sdk 21.2
Please describe the system on which you are running
- Operating system/version: CentOS Linux 7
- Computer hardware: 2 х Intel Xeon Gold 6142 v4, 4 x nVidia Volta GV100GL, 768GB
- Network type: Infiniband
Details of the problem
I am developing library that uses MPI derived datatypes to send and receive aligned data. Derived datatypes created as a combination of vector, hvector, contiguous and resized.
It runs fine on a CPU. I tried to execute code on GPU with the help of CUDA-Aware MPI shipped with hpc_sdk from Nvidia. I noticed that when I call MPI_Alltoall with GPU buffers, MPI starts to copy data from host to device. Single call contains more then 1 million of such calls. It is not a surprise that code runs very slow.
Can you please explain how this works? Are you aware of such behaviour?
Best regards,
Oleg
Metadata
Metadata
Assignees
Labels
No labels