Description
Description of bug
I am attempting to use the CUDA extension for NDTensors. I am finding that tensor contraction sometimes fails, along with other functions like permute
, as well as the NDTensors.generic_randn
call used in the example of the NDTensors CUDA extension (NDTensors/ext/examples/NDTensorCUDA.jl
).
They all fail with ERROR: LoadError: Scalar indexing is disallowed.
(or emit a similar warning in the REPL where scalar indexing is allowed).
Minimal code demonstrating the bug or unexpected behavior
Minimal runnable code
import CUDA
import ITensors
import NDTensors
i, j, k = ITensors.Index(3), ITensors.Index(5), ITensors.Index(2)
A, B = ITensors.randomITensor(i, j), ITensors.randomITensor(j, k)
# works when running script directly, but warns of scalar indexing in REPL
cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)
contracted = cuA * cuB
@show NDTensors.cpu(contracted) ≈ A * B
# without @CUDA.allowscalar, fails with `ERROR: LoadError: Scalar indexing is disallowed.`
permuted = ITensors.permute(cuA, (j, i))
Expected output or behavior
NDTensors.cu()
should work without warning in REPL on ITensors.ITensor
, and ITensors.permute
should work without the scalar indexing error.
Actual output or behavior
Output of minimal runnable code
NDTensors.cpu(contracted) ≈ A * B = true
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
[3] getindex
@ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:9 [inlined]
[4] getindex
@ ~/.julia/packages/StridedViews/MjjHH/src/stridedview.jl:97 [inlined]
[5] macro expansion
@ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:330 [inlined]
[6] macro expansion
@ ./simdloop.jl:77 [inlined]
[7] macro expansion
@ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:329 [inlined]
[8] _mapreduce_kernel!(f::Strided.CaptureArgs{NDTensors.var"#69#70", Tuple{Strided.Arg, Strided.Arg}}, op::Nothing, initop::Nothing, dims::Tuple{Int64, Int64}, blocks::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64})
@ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:229
[9] _mapreduce_block!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, offsets::Tuple{Int64, Int64, Int64}, costs::Tuple{Int64, Int64}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
@ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:152
[10] _mapreduce_order!(f::Any, op::Any, initop::Any, dims::Tuple{Int64, Int64}, strides::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, arrays::Tuple{StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}, StridedViews.StridedView{Float64, 2, CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
@ Strided ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:138
[11] _mapreduce_fuse!
@ ~/.julia/packages/Strided/O8zfl/src/mapreduce.jl:116 [inlined]
[12] copyto!
@ ~/.julia/packages/Strided/O8zfl/src/broadcast.jl:35 [inlined]
[13] materialize!
@ ./broadcast.jl:884 [inlined]
[14] materialize!
@ ./broadcast.jl:881 [inlined]
[15] permutedims!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::NDTensors.var"#69#70")
@ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:246
[16] permutedims!!(R::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, perm::Tuple{Int64, Int64}, f::Function)
@ NDTensors ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:188
[17] permutedims!!
@ ~/.julia/packages/NDTensors/olKso/src/dense/densetensor.jl:186 [inlined]
[18] permutedims
@ ~/.julia/packages/NDTensors/olKso/src/tensoralgebra/generic_tensor_operations.jl:5 [inlined]
[19] permutedims
@ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:64 [inlined]
[20] _permute(as::NDTensors.NeverAlias, T::NDTensors.DenseTensor{Float64, 2, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{Float64, CUDA.CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
@ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:69
[21] permute(as::NDTensors.NeverAlias, T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
@ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:73
[22] #permute#271
@ ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:54 [inlined]
[23] permute(T::ITensors.ITensor, new_inds::Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}})
@ ITensors ~/.julia/packages/ITensors/Qe99p/src/tensor_operations/permutations.jl:38
[24] top-level scope
@ ~/it_test2/test.jl:17
in expression starting at /home/william/it_test2/test.jl:17
Output of `cuA, cuB = NDTensors.cu(A), NDTensors.cu(B)` in REPL
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f5e6b5fc970.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:
Also, running `NDTensors/ext/examples/NDTensorCUDA.jl` results in a similar error
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
[3] setindex!
@ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:17 [inlined]
[4] generic_randn(arraytype::Type{CuArray{T, 1} where T}, dim::Int64)
@ NDTensors ~/.julia/packages/NDTensors/olKso/src/abstractarray/fill.jl:8
[5] main()
@ Main ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:21
[6] top-level scope
@ ~/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116
in expression starting at /home/william/ITensors.jl/NDTensors/ext/examples/NDTensorCUDA.jl:116
Version information
- Output from
versioninfo()
:
julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
Other relevant version info
(it_test2) pkg> status ITensors
Status `~/it_test2/Project.toml`
[9136182c] ITensors v0.3.42
(it_test2) pkg> status NDTensors
Status `~/it_test2/Project.toml`
[23ae76d9] NDTensors v0.2.10
(it_test2) pkg> status CUDA
Status `~/it_test2/Project.toml`
[052768ef] CUDA v4.4.1
julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.5
CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.104.5
Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0
Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce RTX 3090 (sm_86, 22.728 GiB / 24.000 GiB available)
I am very new to Julia (and ITensors.jl), sorry if there is something obvious I'm missing!