Skip to content

Assertion ggml_nelements(a) == ne0*ne1*ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

Closed
@xvolks

Description

@xvolks

Loading the Llama 2 - 70B model from TheBloke with rustformers/llm seems to work but fails on inference.

llama.cpp raises an assertion regardless of the use_gpu option :

Loading of model complete
Model size = 27262.60 MB / num tensors = 723
[2023-07-29T14:24:19Z INFO  actix_server::builder] starting 10 workers
[2023-07-29T14:24:19Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
GGML_ASSERT: llama-cpp/ggml.c:6192: ggml_nelements(a) == ne0*ne1*ne2

This might be related to the model files, but the models from TheBloke are usually reliable.

Running on MacBook Pro M1 Max 32 GB RAM.
macOS 14.0.0 23A5301g

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions