Hi! I have a func.func to launch a gpu.func. It as this.
func.func @test(%arg0: memref<1800x1800xf16>, %arg1: memref<16x8xf16>, %arg2: memref<1800x1800xf16>, %arg3: memref<8x16xf16>) {
%c2 = arith.constant 2 : index
%c8 = arith.constant 8 : index
%cst = arith.constant 0.000000e+00 : f16
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c57 = arith.constant 57 : index
%c32 = arith.constant 32 : index
%c4 = arith.constant 4 : index
%0 = gpu.wait async
%1 = gpu.launch_func async [%0] @test_kernel::@test_kernel blocks in (%c57, %c57, %c1) threads in (%c32, %c4, %c1) args(%arg1 : memref<16x8xf16>, %arg3 : memref<8x16xf16>, %arg0 : memref<1800x1800xf16>, %arg2 : memref<1800x1800xf16>)
gpu.wait [%1]
return
}
I want to call this func.func in C++ by llvm-request-c-wrappers. So I need to use cudaMalloc to get a gpu pointer and set it into a memref structure. Is this way right? Because I found some errors in my test. Thank you for your help!