Skip to content

Assertion failure in ggml_mul_mat_q4_0_q8_1_cuda (g_compute_capabilities[id] >= MIN_CC_DP4A) #4229

Closed
@cebtenzzre

Description

@cebtenzzre

Current Behavior

I got this crash on https://github.com/cebtenzzre/llama.cpp/tree/18fe116e9a5aa45a83bd1d6f043f98dc395f218e:

2023-11-26 20:06:04 INFO:Loaded the model in 9.14 seconds.

GGML_ASSERT: /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:5484: false

Failure Information (for bugs)

Backtrace:

#3  0x00007f5999fd54b8 in __GI_abort () at abort.c:79
#4  0x00007f585ac6b357 in ggml_mul_mat_q4_0_q8_1_cuda (stream=<optimized out>, nrows_dst=<optimized out>, nrows_y=<optimized out>, ncols_y=<optimized out>, 
    nrows_x=<optimized out>, ncols_x=<optimized out>, dst=<optimized out>, vy=<optimized out>, vx=<optimized out>)
    at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:5076
#5  ggml_cuda_op_mul_mat_q (src0=src0@entry=0x204c00320, src1=src1@entry=0x269123d80, dst=dst@entry=0x269123f00, src0_dd_i=src0_dd_i@entry=0x90be00000 "", 
    src1_ddf_i=src1_ddf_i@entry=0x9b0400000, src1_ddq_i=src1_ddq_i@entry=0x9afe00000 "", dst_dd_i=0x90b420400, row_low=32000, row_high=32032, src1_ncols=512, 
    src1_padded_row_size=5120, stream=@0x7f5878be7fa8: 0x7f5861b127a0) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:6098
#6  0x00007f585ac641f2 in ggml_cuda_op_mul_mat (src0=0x204c00320, src1=<optimized out>, dst=<optimized out>, 
    op=0x7f585ac6b270 <ggml_cuda_op_mul_mat_q(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st* const&)>, convert_src1_to_q8_1=true) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:6959
#7  0x00007f585ac66023 in ggml_cuda_compute_forward (params=params@entry=0x7f5878be8560, tensor=tensor@entry=0x269123f00)
    at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml-cuda.cu:7844
#8  0x00007f585ac4606e in ggml_compute_forward (tensor=0x269123f00, params=0x7f5878be8560) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml.c:14503
#9  ggml_graph_compute_thread (data=data@entry=0x7f5878be85e0) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml.c:16245
#10 0x00007f585ac4862e in ggml_graph_compute (cgraph=0x269000020, cplan=<optimized out>) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/ggml.c:16831
#11 0x00007f585ac794b3 in ggml_graph_compute_helper (buf=std::vector of length 0, capacity 0, graph=graph@entry=0x269000020, n_threads=n_threads@entry=1)
    at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/llama.cpp:592
#12 0x00007f585ac7c365 in llama_decode_internal (lctx=..., batch=...) at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/llama.cpp:5194
#13 0x00007f585ac7cac8 in llama_eval (ctx=0x7f586234bff0, tokens=0x7f5862346200, n_tokens=512, n_past=0)
    at /home/jared/src/forks/llama-cpp-python/vendor/llama.cpp/llama.cpp:8842
#14 0x00007f5998def4f6 in ffi_call_unix64 () at ../src/x86/unix64.S:104

Relevant code: https://github.com/cebtenzzre/llama.cpp/blob/18fe116e9a5aa45a83bd1d6f043f98dc395f218e/ggml-cuda.cu#L5054-L5077

It asserts that g_compute_capabilities[id] >= MIN_CC_DP4A (610) where id is the current device. But it is 520, which matches my GTX 970:

>>> print id
$10 = 1
>>> print g_compute_capabilities[0]
$11 = 610
>>> print g_compute_capabilities[1]
$12 = 520

Steps to Reproduce

I'm not exactly sure how I ran into this issue, because I've been using the same build for weeks without seeing it. It could be an issue with my fork - I should investigate whether the latest llama.cpp is still significantly slower on my GPUs. I still have the coredump handy if any further information would help.

cc @slaren

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions