Hi,
I’m running into an issue trying to apply CUDA separable compilation to my project, which uses shared libraries. Note: I am aware that device linking cannot be done across shared library boundaries. That’s not what I’m doing. My shared libraries have a regular C/C++ interface.
After some experimentation, I believe the issue is: I’m performing device-link twice (one for each library), and the same symbol exists in both device linked objects. Why is this a problem?
I have asked the question in StackOverflow, I’ll reproduce it here.
I’m trying to use CUDA separable compilation in my project. The project is composed of a binary that depends on a few shared libraries (all built in the same build system). These shared libraries in turn use common CUDA code. When running the binary, I get a segfault similar to here. When I create a minimal example, I get “invalid device function” error instead. If I turn the shared libraries into static libraries, the error goes away . Unfortunately I don’t have control over this and need to make it work with shared libraries.
I have seen a couple similar posts here in SO, but they use CMake and the solutions usually involve changing libraries from shared to static, which I can’t do in my project. I have double-checked that I’m running the code on the right GPU (and indeed it works if I do some changes, see below), so that’s not the issue.
I believe I’m missing something when doing CUDA separable compilation, device linking or creating shared libraries.
Below is a fully reproducible minimal example of the problem:
// common.h
#ifndef COMMON_H
#define COMMON_H
__device__ int common();
#endif
// common.cu
#include "common.h"
__device__ int common()
{
return 123;
}
// a.h
#ifndef A_H
#define A_H
__attribute__((__visibility__("default")))
void runA();
#endif
// a.cu
#include "a.h"
#include <cstdio>
#include <iostream>
#include "common.h"
__global__ void kernelA()
{
printf("Running A: %d\n", 456 + common());
}
void runA()
{
kernelA<<<1,1>>>();
std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
cudaDeviceSynchronize();
}
// b.h
#ifndef B_H
#define B_H
__attribute__((__visibility__("default")))
void runB();
#endif
// b.cu
#include "b.h"
#include <cstdio>
#include <iostream>
#include "common.h"
__global__ void kernelB()
{
printf("Running B: %d\n", 321 + common());
}
void runB()
{
kernelB<<<1,1>>>();
std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
cudaDeviceSynchronize();
}
// main.cpp
#include "a.h"
#include "b.h"
int main()
{
runA();
runB();
}
So basically a binary depending on 2 shared libraries A and B, both of which utilize the common()
device function.
This is my build/test script:
#!/usr/bin/env bash
set -euxo pipefail
CUDA_ROOT=/usr/local/cuda-10.2
NVCC=$CUDA_ROOT/bin/nvcc
CC=/usr/bin/g++
GENCODE="arch=compute_75,code=sm_75"
# Clean previous build
rm -f *.o *.so main
# Compile relocatable CUDA code
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden common.cu -o common.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden a.cu -o a.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden b.cu -o b.cu.o
# Build shared library A
$NVCC -gencode=$GENCODE -dlink common.cu.o a.cu.o -o a.dlink.o
$CC -shared common.cu.o a.cu.o a.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o liba.so
# Build shared library B
$NVCC -gencode=$GENCODE -dlink common.cu.o b.cu.o -o b.dlink.o
$CC -shared common.cu.o b.cu.o b.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o libb.so
# Build final executable
$CC main.cpp -L. -la -lb -o main
# Run it
LD_LIBRARY_PATH=. ./main
Running it I get:
invalid device function
invalid device function
After some trial and error, I notice the problem is solved by:
- Not linking
common.cu.o
in either library when device linking (obviously I need to make either library no longer use thecommon()
function. - Making A and B static libraries.
- Combining A and B into one single shared library.
Unfortunately I cannot apply these solutions in my project. Why is it a problem to have 2 shared libraries? I’ve read about the “device linker ignoring shared libraries”, but in this case it’s the host linker creating the shared library, not the device linker, so I’m hoping that’s OK?
Thanks!