Skip to content

Bug: rwkv and mamba models cannot be used with -ngl 0 after CPU backend refactor #10351

Closed
@MollySophia

Description

@MollySophia

What happened?

$ ./build/bin/llama-bench -m ~/Downloads/mamba-2.8b-q4_0.gguf -ngl 0
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
/Users/molly/llama.cpp/ggml/src/ggml-backend.cpp:745: pre-allocated tensor in a backend that cannot run the operation
[1]    13345 abort      ./build/bin/llama-bench -m ~/Downloads/mamba-2.8b-q4_0.gguf -ngl 0
$ ./build/bin/llama-bench -m /Volumes/grouped/Models/rwkv/v6-Finch-7B-HF/v6-Finch-7B-HF-Q4_0.gguf -ngl 0
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
/Users/molly/llama.cpp/ggml/src/ggml-backend.cpp:745: pre-allocated tensor in a backend that cannot run the operation
[1]    16003 abort      ./build/bin/llama-bench -m  -ngl 0

Using lldb to trace the error, it fails in ggml_backend_sched_backend_id_from_cur

    if (tensor->buffer || (tensor->view_src && tensor->view_src->buffer)) {
        // since the tensor is pre-allocated, it cannot be moved to another backend
        GGML_ABORT("pre-allocated tensor in a backend that cannot run the operation");
    }

, where the tensor triggering this fault was a view of cache_k_l0. This makes sense, as both mamba and rwkv do a GGML_VIEW/GGML_RESHAPE on the k cache when building the graph.

CC @compilade

Name and Version

Non-working version:
$ ./build/bin/llama-cli -v
build: 4098 (772703c) with Apple clang version 16.0.0 (clang-1600.0.26.4) for arm64-apple-darwin24.1.0

Known working version:
$ ./build/bin/llama-cli -v
build: 4079 (4a8ccb3) with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Linux, Mac

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions