ggml-cpu: cmake add arm64 cpu feature check for macos #10487

chaxu01 · 2024-11-25T11:30:14Z

Add fixe for #10435

ggerganov · 2024-11-25T14:07:55Z

ggml/src/ggml-cpu/CMakeLists.txt

+                add_compile_definitions(__ARM_FEATURE_DOTPROD)
+            endif ()
+
+            check_cxx_source_compiles("#include <arm_neon.h>\nint main() { int8x16_t _a, _b; int32x4_t _s = vmlaq_f32(_s, _a, _b); return 0; }" GGML_COMPILER_SUPPORT_MATMUL_INT8)


How come the presence of vmlaq_f32 confirms __ARM_FEATURE_MATMUL_INT8?

Thanks for your question. The current check using vmlaq_f32 to confirm __ARM_FEATURE_MATMUL_INT8 is incorrect because vmlaq_f32 operates on floating-point values (float32x4_t), while __ARM_FEATURE_MATMUL_INT8 is specifically related to integer matrix multiplication using INT8 (8-bit signed integers).

I'll update the patch using an operation that deals with INT8 data types, such as vmlaq_s32.

ggerganov · 2024-11-27T09:29:01Z

This change breaks Q4_0. Repro:

make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU

On M1 Pro it crashes like this:

...
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): Illegal instruction: 4

On M2 Ultra, the tests fail:

  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): sentinel mismatch: sent_3 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.059779312 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK

ggerganov · 2024-11-27T11:06:06Z

The problem is even though the sources compile successfully, they do not run:

g++ -march=armv8.2a+i8mm test.cpp && ./a.out
Illegal instruction: 4

chaxu01 · 2024-11-27T13:42:22Z

I was able to reproduce the failing cases on an M3. The issue appears to be related to the test test-backend-ops, which is a special case that doesn't require model loading. As a result, the Q4_0 optimized kernel was not activated despite the hardware supporting i8mm. Meanwhile the macro GGML_USE_LLAMAFILE was undefined as result of __ARM_FEATURE_MATMUL_INT8 definition. This mismatch caused the kernel selection logic to skip both the optimized path for Q4_0 and GGML_USE_LLAMAFILE path.

I applied a temporary patch to prevent GGML_USE_LLAMAFILE from being undefined and make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU worked as expected without errors.

diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c
index 4b58254e..e1224161 100644
--- a/ggml/src/ggml-cpu/ggml-cpu.c
+++ b/ggml/src/ggml-cpu/ggml-cpu.c
@@ -40,7 +40,7 @@
 #endif

 #if defined(__ARM_FEATURE_SVE) || defined(__ARM_FEATURE_MATMUL_INT8)
-#undef GGML_USE_LLAMAFILE
+//#undef GGML_USE_LLAMAFILE
 #endif

ggerganov · 2024-11-27T15:55:31Z

The test-backend-ops should work without GGML_USE_LLAMAFILE. I haven't had time yet to look in details, but what @slaren discovered earlier today in #10543, it seems that the I8MM kernels have out-of-bounds writes for certain sizes?

chaxu01 · 2024-11-27T16:24:48Z

The test-backend-ops didn't seem use I8MM optimized kernel in my test on a M3 even though __ARM_FEATURE_MATMUL_INT8 was enabled.

slaren · 2024-11-27T18:16:53Z

test-backend-ops does not use the repack buffers (that's something we need to add to be able to test the repacking functionality), but the standard Q4_0 implementation in ggml_vec_dot_q4_0_q8_0 also supports __ARM_FEATURE_MATMUL_INT8, and that is what is failing here.

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

ggml-cpu: cmake add arm64 cpu feature check for macos

8797958

chaxu01 mentioned this pull request Nov 25, 2024

Bug: Severe Performance Degradation on Q4_0 CPU-only with MacOS / Apple Silicon M2, after PR#9921 / Version 4081 #10435

Open

slaren approved these changes Nov 25, 2024

View reviewed changes

ggerganov reviewed Nov 25, 2024

View reviewed changes

use vmmlaq_s32 for compile option i8mm check

ac90ee3

ggerganov approved these changes Nov 26, 2024

View reviewed changes

ggerganov merged commit 25669aa into ggml-org:master Nov 26, 2024
53 checks passed

ggerganov mentioned this pull request Nov 27, 2024

cmake : fix ARM feature detection #10543

Merged

ggerganov mentioned this pull request Nov 28, 2024

ggml : fix row condition for i8mm kernels #10561

Merged

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

ggml-cpu: cmake add arm64 cpu feature check for macos (ggml-org#10487)

f134ef5

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: cmake add arm64 cpu feature check for macos #10487

ggml-cpu: cmake add arm64 cpu feature check for macos #10487

Uh oh!

chaxu01 commented Nov 25, 2024 •

edited

Loading

Uh oh!

ggerganov Nov 25, 2024

Uh oh!

chaxu01 Nov 25, 2024

Uh oh!

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

chaxu01 commented Nov 27, 2024

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

chaxu01 commented Nov 27, 2024

Uh oh!

slaren commented Nov 27, 2024

Uh oh!

Uh oh!

ggml-cpu: cmake add arm64 cpu feature check for macos #10487

ggml-cpu: cmake add arm64 cpu feature check for macos #10487

Uh oh!

Conversation

chaxu01 commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

chaxu01 Nov 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

chaxu01 commented Nov 27, 2024

Uh oh!

ggerganov commented Nov 27, 2024

Uh oh!

chaxu01 commented Nov 27, 2024

Uh oh!

slaren commented Nov 27, 2024

Uh oh!

Uh oh!

chaxu01 commented Nov 25, 2024 •

edited

Loading