CUDA separable compilation + shared libraries -> "Invalid function" error

carlosgalvezp · September 5, 2021, 2:57pm

Hi,

I’m running into an issue trying to apply CUDA separable compilation to my project, which uses shared libraries. Note: I am aware that device linking cannot be done across shared library boundaries. That’s not what I’m doing. My shared libraries have a regular C/C++ interface.

After some experimentation, I believe the issue is: I’m performing device-link twice (one for each library), and the same symbol exists in both device linked objects. Why is this a problem?

I have asked the question in StackOverflow, I’ll reproduce it here.

I’m trying to use CUDA separable compilation in my project. The project is composed of a binary that depends on a few shared libraries (all built in the same build system). These shared libraries in turn use common CUDA code. When running the binary, I get a segfault similar to here. When I create a minimal example, I get “invalid device function” error instead. If I turn the shared libraries into static libraries, the error goes away . Unfortunately I don’t have control over this and need to make it work with shared libraries.

I have seen a couple similar posts here in SO, but they use CMake and the solutions usually involve changing libraries from shared to static, which I can’t do in my project. I have double-checked that I’m running the code on the right GPU (and indeed it works if I do some changes, see below), so that’s not the issue.

I believe I’m missing something when doing CUDA separable compilation, device linking or creating shared libraries.

Below is a fully reproducible minimal example of the problem:

// common.h
#ifndef COMMON_H
#define COMMON_H

__device__ int common();

#endif

// common.cu
#include "common.h"

__device__ int common()
{
    return 123;
}

// a.h
#ifndef A_H
#define A_H

__attribute__((__visibility__("default")))
void runA();

#endif

// a.cu
#include "a.h"

#include <cstdio>
#include <iostream>

#include "common.h"

__global__ void kernelA()
{
    printf("Running A: %d\n", 456 + common());
}

void runA()
{
    kernelA<<<1,1>>>();
    std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
    cudaDeviceSynchronize();
}

// b.h
#ifndef B_H
#define B_H

__attribute__((__visibility__("default")))
void runB();

#endif

// b.cu
#include "b.h"

#include <cstdio>
#include <iostream>

#include "common.h"

__global__ void kernelB()
{
    printf("Running B: %d\n", 321 + common());
}

void runB()
{
    kernelB<<<1,1>>>();
    std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
    cudaDeviceSynchronize();
}

// main.cpp
#include "a.h"
#include "b.h"

int main()
{
    runA();
    runB();
}

So basically a binary depending on 2 shared libraries A and B, both of which utilize the common() device function.

This is my build/test script:

#!/usr/bin/env bash
set -euxo pipefail

CUDA_ROOT=/usr/local/cuda-10.2
NVCC=$CUDA_ROOT/bin/nvcc
CC=/usr/bin/g++
GENCODE="arch=compute_75,code=sm_75"

# Clean previous build
rm -f *.o *.so main

# Compile relocatable CUDA code
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden common.cu -o common.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden      a.cu -o      a.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden      b.cu -o      b.cu.o

# Build shared library A
$NVCC -gencode=$GENCODE -dlink common.cu.o a.cu.o -o a.dlink.o
$CC -shared common.cu.o a.cu.o a.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o liba.so

# Build shared library B
$NVCC -gencode=$GENCODE -dlink common.cu.o b.cu.o -o b.dlink.o
$CC -shared common.cu.o b.cu.o b.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o libb.so

# Build final executable
$CC main.cpp -L. -la -lb -o main

# Run it
LD_LIBRARY_PATH=. ./main

Running it I get:

invalid device function
invalid device function

After some trial and error, I notice the problem is solved by:

Not linking common.cu.o in either library when device linking (obviously I need to make either library no longer use the common() function.
Making A and B static libraries.
Combining A and B into one single shared library.

Unfortunately I cannot apply these solutions in my project. Why is it a problem to have 2 shared libraries? I’ve read about the “device linker ignoring shared libraries”, but in this case it’s the host linker creating the shared library, not the device linker, so I’m hoping that’s OK?

Thanks!

Yuki_Ni · September 6, 2021, 7:12am

Could you please file us a ticket following the instruction here Getting Help with CUDA NVCC Compiler . You can add this topic link in the description . We will take a look . Thanks.

carlosgalvezp · September 6, 2021, 8:15am

Absolutely, thanks! Just filed the bug, let me know if I can help further.

carlosgalvezp · September 6, 2021, 12:25pm

Browsing the Forums I believe my issue is closest to this one:

If I create only 1 .so file, which links together both a.dlink.o and b.dlink.o, I get “multiple definitions” error. This cannot be caught by the linker if I build 2 independent .so files.

Another thing that I have observed is that, when I run my program and the runtime linker starts to load to my libraries, they call this function twice:

__cudaRegisterLinkedBinary_14_common_cpp1_ii_e5ca4d49

This doesn’t happen with the other device functions - they get called only once. Is this a problem? Registering the same binary twice?

Thanks!

Yuki_Ni · November 11, 2021, 3:27pm

Your original issue is talked internally and here is the info
The problem is the duplicate common.o in both libraries. If you copy common.cu to common2.cu, and then link common2.o in libb, common.o in liba, then it works.

Also here is my other workaround on trying your case

yni@node1:~/yni/CustomerBug/nvbug3373815$ nvcc -arch=sm_75 -dlink common.cu.o a.cu.o b.cu.o -o ab.dlink.o
yni@node1:~/yni/CustomerBug/nvbug3373815$

yni@node1:~/yni/CustomerBug/nvbug3373815$ g++  -shared common.cu.o a.cu.o b.cu.o ab.dlink.o -L /usr/local/cuda-11.5/lib64 -lcuda -lcudart -o libmy.so
yni@node1:~/yni/CustomerBug/nvbug3373815$ 
yni@node1:~/yni/CustomerBug/nvbug3373815$ g++ main.cpp -L . -lmy -o myMain
yni@node1:~/yni/CustomerBug/nvbug3373815$
yni@node1:~/yni/CustomerBug/nvbug3373815$
yni@node1:~/yni/CustomerBug/nvbug3373815$ LD_LIBRARY_PATH=. ./myMain
no error
Running A: 579
no error
Running B: 444

tomilovanatoliy · July 13, 2022, 5:08pm

Would it be fixed? It seems that initialization code should be arranged as singleton, that is not the case currently.

carlosgalvezp · July 14, 2022, 1:04pm

Found a solution: add -fvisibility=hidden to the dlink step:

-$NVCC -gencode=$GENCODE -dlink common.cu.o a.cu.o -o a.dlink.o
+$NVCC -gencode=$GENCODE -dlink -Xcompiler -fvisibility=hidden common.cu.o a.cu.o -o a.dlink.o

-$NVCC -gencode=$GENCODE -dlink common.cu.o b.cu.o -o b.dlink.o
+$NVCC -gencode=$GENCODE -dlink -Xcompiler -fvisibility=hidden common.cu.o b.cu.o -o b.dlink.o

system · July 28, 2022, 1:04pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11413	June 20, 2018
CUDA C++ Compiler Updates Impacting ELF Visibility and Linkage Technical Blog	2	22	June 24, 2025
Build Error MSB3721 When calling object method within kernel, using compiler directives CUDA Programming and Performance	9	5729	November 18, 2015
Issue with cudaMemcpyToSymbol and Separable Compilation. CUDA Programming and Performance	10	1265	February 28, 2019
Compile CUDA in MSVS with device code in different compilation units CUDA NVCC Compiler	1	521	December 6, 2022
NVCC forces c++ compilation of .cu files CUDA Programming and Performance	11	25697	December 11, 2011
Separate compilation of CUDA code into library, for use with existing code base CUDA Programming and Performance	9	9445	June 15, 2017
How to link host code with a static CUDA library after separable compilation? CUDA Programming and Performance	2	7684	April 30, 2013
Compiling / linking CUDA apps? CUDA Programming and Performance	8	4830	September 21, 2009
Problem using CUDA 5.0 Linker CUDA Programming and Performance	7	3938	January 17, 2013

CUDA separable compilation + shared libraries -> "Invalid function" error

Related topics