Closed
Description
What happened?
Run any Q4_0_4_4 model, now it fails with an assertion error. Any clue for this?
The last good version I know is b3971
(2024 Oct 24). I'll do some bisection later.
Name and Version
$ build/bin/llama-cli --version
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
version: 4026 (05697f67)
built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin23.5.0
The metal backend is disabled explicitly by setting DGGML_METAL=OFF
What operating system are you seeing the problem on?
Mac
Relevant log output
$ llama-bench -m llama32-1b-instruct-q4_0_4_4.gguf
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
warning: asserts enabled, performance may be affected
| model | size | params | backend | threads | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
Assertion failed: (!isnan(x)), function ggml_compute_forward_silu_f32, file ggml-cpu.c, line 6649.