Closed
Description
Loading the Llama 2 - 70B model from TheBloke with rustformers/llm seems to work but fails on inference.
llama.cpp raises an assertion regardless of the use_gpu
option :
Loading of model complete
Model size = 27262.60 MB / num tensors = 723
[2023-07-29T14:24:19Z INFO actix_server::builder] starting 10 workers
[2023-07-29T14:24:19Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
GGML_ASSERT: llama-cpp/ggml.c:6192: ggml_nelements(a) == ne0*ne1*ne2
This might be related to the model files, but the models from TheBloke are usually reliable.
Running on MacBook Pro M1 Max 32 GB RAM.
macOS 14.0.0 23A5301g