Linking objects MPI and CUDA

I’m writing a source with MPI and CUDA and compiling objects separately but when I try link all objects and make executable gets undefined reference to __cudaRegisterLinkedBinary

I have 4 files: file_1.cpp, file_2.cpp and file_3.cpp are compiled by mpicxx and deviceCode.cu with nvcc

Following way:

mpicxx -c file_1.cpp -o file_1.o -I/usr/local/include -MMD -MP -MF -lgecodedriver -lgecodesearch -lgecodeminimodel -lgecodeint -lgecodekernel -lgecodesupport 
mpicxx -c file_2.cpp -o file_2.o -I/usr/local/include -MMD -MP -MF -lgecodedriver -lgecodesearch -lgecodeminimodel -lgecodeint -lgecodekernel -lgecodesupport 
mpicxx -c file_3.cpp -o file_3.o -I/usr/local/include -MMD -MP -MF -lgecodedriver -lgecodesearch -lgecodeminimodel -lgecodeint -lgecodekernel -lgecodesupport 
nvcc -ccbin g++ -m64 -rdc=true -arch=sm_35 -g -G -c deviceCode.cu -o deviceCode.o -lcudart -lcudadevrt -I/home/user/opt/openmpi/include -L/home/user/opt/openmpi/lib -lmpi
mpicxx -c main.cpp -o main.o
mpicxx -o main file_1.o file_2.o file_3.o deviceCode.o main.o -L/usr/local/lib -lgecodedriver -lgecodesearch -lgecodeminimodel -lgecodeint -lgecodekernel -lgecodesupport  -L/usr/local/cuda/lib64 -lcudart -lcudadevrt
deviceCode.o: In function `__sti____cudaRegisterAll_45_tmpxft_000026c9_00000000_7_deviceCode_cpp1_ii_texTSD':
/tmp/tmpxft_000026c9_00000000-4_deviceCode.cudafe1.stub.c:60: undefined reference to `__cudaRegisterLinkedBinary_45_tmpxft_000026c9_00000000_7_deviceCode_cpp1_ii_texTSD'
collect2: error: ld returned 1 exit status
Makefile:23: recipe for target 'all' failed
make: *** [all] Error 1

My texTSD is a texture declared globally in deviceCode.cu and used only here. What my error to link all objects?

This:

nvcc -ccbin g++ -m64 -rdc=true -arch=sm_35 -g -G -c ...

Is requesting separate compilation and linking for device code. The -c indicates this will be the device compile step.

Normally you would follow this up with a final link step done by nvcc.

If you want the final link step to be done with your host compiler (e.g. mpicxx) then you must perform a separate device link step with an additional nvcc command.

It’s covered in the nvcc manual, there are a number of cuda sample codes/projects that demonstrate how to do separate device compile/link (rdc) and there are numerous questions discussing it all over the web and on these forums.

I’m following example of NVIDIA Samples 7.5 (simpleMPI) and appears -c what the differences?

/home/user/opt/openmpi/bin/mpicxx     -o simpleMPI_mpi.o -c simpleMPI.cpp
/usr/local/cuda-7.5/bin/nvcc -ccbin g++   -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=compute_20 -o simpleMPI.o -c simpleMPI.cu
/home/user/opt/openmpi/bin/mpicxx    -o simpleMPI simpleMPI_mpi.o simpleMPI.o  -L/usr/local/cuda-7.5/lib64 -lcudart

The difference would appear to be that the simpleMPI build does not use -rdc=true, therefore no device-code linking step is required.