Skip to content

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

Closed
@FanShupei

Description

@FanShupei

What happened?

Run any Q4_0_4_4 model, now it fails with an assertion error. Any clue for this?

The last good version I know is b3971 (2024 Oct 24). I'll do some bisection later.

Name and Version

$ build/bin/llama-cli --version
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
version: 4026 (05697f67)
built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin23.5.0

The metal backend is disabled explicitly by setting DGGML_METAL=OFF

What operating system are you seeing the problem on?

Mac

Relevant log output

$ llama-bench -m llama32-1b-instruct-q4_0_4_4.gguf

register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
warning: asserts enabled, performance may be affected
| model                          |       size |     params | backend    | threads | fa |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
Assertion failed: (!isnan(x)), function ggml_compute_forward_silu_f32, file ggml-cpu.c, line 6649.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions