Skip to content

Bug: Severe Performance Degradation on Q4_0 CPU-only with MacOS / Apple Silicon M2, after PR#9921 / Version 4081 #10435

Open
@AndreasKunar

Description

@AndreasKunar

What happened?

Prior to PR #9921 / Version 4081 the -ngl 0 Q4_0 llama performance was significantly higher (more than 10x) than afterwards.
(hardware: Apple MacBook Air M2 10 GPU 24GB RAM)

before PR:
make clean
git checkout ae8de6d
make -j llama-bench
./llama-bench -p 512 -n 128 -t 4 -ngl 0 -m ...model...

model size params backend threads test t/s
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 pp512 60.48 ± 0.49
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 tg128 14.89 ± 0.20
llama 7B Q4_0_4_4 3.56 GiB 6.74 B Metal,BLAS 4 pp512 63.50 ± 2.47
llama 7B Q4_0_4_4 3.56 GiB 6.74 B Metal,BLAS 4 tg128 11.93 ± 3.30
with ngl 99:
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 pp512 194.94 ± 0.07
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 tg128 11.81 ± 6.53

build: ae8de6d (4080)

versions after PR (including current):
make clean
git checkout 1607a5e
make -j llama-bench
./llama-bench -p 512 -n 128 -t 4 -ngl 0 -m ...model...

model size params backend threads test t/s
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 pp512 4.11 ± 0.24
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 tg128 1.86 ± 0.01
llama 7B Q4_0_4_4 3.56 GiB 6.74 B Metal,BLAS 4 pp512 62.81 ± 2.55
llama 7B Q4_0_4_4 3.56 GiB 6.74 B Metal,BLAS 4 tg128 14.70 ± 1.97
with ngl 99:
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 pp512 186.02 ± 13.18
llama 7B Q4_0 3.56 GiB 6.74 B Metal,BLAS 4 tg128 11.25 ± 3.42

build: 1607a5e (4081)

The variations except for -ngl 0 / Q4_0 might be due to the MacBook Air's thermals.

Name and Version

Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.1.0
Thread model: posix
macOS Sequoia 15.1.1

What operating system are you seeing the problem on?

Mac

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions