-
Notifications
You must be signed in to change notification settings - Fork 244
Closed
Description
Dear CUDA developers,
I just noticed something strange. What can cause these timings? (Why does it decrease over time?)
This basic code timings:
using BenchmarkTools
using CUDA
N= 500; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 1000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 2000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 4000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 6000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 8000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 10000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
N= 12000; A = CUDA.randn(Float32, N, N); @btime sum($A, dims=2)
;
Results:
7.228 μs (64 allocations: 2.66 KiB)
8.189 μs (82 allocations: 3.20 KiB)
8.270 μs (83 allocations: 3.23 KiB)
8.307 μs (83 allocations: 3.23 KiB)
6.517 μs (43 allocations: 1.89 KiB)
6.505 μs (43 allocations: 1.89 KiB)
6.379 μs (43 allocations: 1.89 KiB)
6.635 μs (43 allocations: 1.89 KiB)
Sounds like an interesting anomaly.
versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
CUDA.versioninfo()
CUDA toolkit 11.4.1, artifact installation
CUDA driver 11.5.0
NVIDIA driver 495.29.5
Libraries:
- CUBLAS: 11.5.4
- CURAND: 10.2.5
- CUFFT: 10.5.1
- CUSOLVER: 11.2.0
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+495.29.5
- CUDNN: 8.20.2 (for CUDA 11.4.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)
Toolchain:
- Julia: 1.7.0
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: NVIDIA GeForce GTX 1050 (sm_61, 310.438 MiB / 3.945 GiB available)
Metadata
Metadata
Assignees
Labels
No labels