Skip to content

ggml-cpu: cmake add arm64 cpu feature check for macos #10487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

chaxu01
Copy link
Collaborator

@chaxu01 chaxu01 commented Nov 25, 2024

Add fixe for #10435

add_compile_definitions(__ARM_FEATURE_DOTPROD)
endif ()

check_cxx_source_compiles("#include <arm_neon.h>\nint main() { int8x16_t _a, _b; int32x4_t _s = vmlaq_f32(_s, _a, _b); return 0; }" GGML_COMPILER_SUPPORT_MATMUL_INT8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the presence of vmlaq_f32 confirms __ARM_FEATURE_MATMUL_INT8?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your question. The current check using vmlaq_f32 to confirm __ARM_FEATURE_MATMUL_INT8 is incorrect because vmlaq_f32 operates on floating-point values (float32x4_t), while __ARM_FEATURE_MATMUL_INT8 is specifically related to integer matrix multiplication using INT8 (8-bit signed integers).

I'll update the patch using an operation that deals with INT8 data types, such as vmlaq_s32.

@ggerganov ggerganov merged commit 25669aa into ggml-org:master Nov 26, 2024
53 checks passed
@ggerganov
Copy link
Member

This change breaks Q4_0. Repro:

make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU

On M1 Pro it crashes like this:

...
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): Illegal instruction: 4

On M2 Ultra, the tests fail:

  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): sentinel mismatch: sent_3 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.059779312 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK

@ggerganov
Copy link
Member

The problem is even though the sources compile successfully, they do not run:

g++ -march=armv8.2a+i8mm test.cpp && ./a.out
Illegal instruction: 4

@chaxu01
Copy link
Collaborator Author

chaxu01 commented Nov 27, 2024

I was able to reproduce the failing cases on an M3. The issue appears to be related to the test test-backend-ops, which is a special case that doesn't require model loading. As a result, the Q4_0 optimized kernel was not activated despite the hardware supporting i8mm. Meanwhile the macro GGML_USE_LLAMAFILE was undefined as result of __ARM_FEATURE_MATMUL_INT8 definition. This mismatch caused the kernel selection logic to skip both the optimized path for Q4_0 and GGML_USE_LLAMAFILE path.

I applied a temporary patch to prevent GGML_USE_LLAMAFILE from being undefined and make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU worked as expected without errors.

diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c
index 4b58254e..e1224161 100644
--- a/ggml/src/ggml-cpu/ggml-cpu.c
+++ b/ggml/src/ggml-cpu/ggml-cpu.c
@@ -40,7 +40,7 @@
 #endif

 #if defined(__ARM_FEATURE_SVE) || defined(__ARM_FEATURE_MATMUL_INT8)
-#undef GGML_USE_LLAMAFILE
+//#undef GGML_USE_LLAMAFILE
 #endif

@ggerganov
Copy link
Member

The test-backend-ops should work without GGML_USE_LLAMAFILE. I haven't had time yet to look in details, but what @slaren discovered earlier today in #10543, it seems that the I8MM kernels have out-of-bounds writes for certain sizes?

@chaxu01
Copy link
Collaborator Author

chaxu01 commented Nov 27, 2024

The test-backend-ops didn't seem use I8MM optimized kernel in my test on a M3 even though __ARM_FEATURE_MATMUL_INT8 was enabled.

@slaren
Copy link
Member

slaren commented Nov 27, 2024

test-backend-ops does not use the repack buffers (that's something we need to add to be able to test the repacking functionality), but the standard Q4_0 implementation in ggml_vec_dot_q4_0_q8_0 also supports __ARM_FEATURE_MATMUL_INT8, and that is what is failing here.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* ggml-cpu: cmake add arm64 cpu feature check for macos

* use vmmlaq_s32 for compile option i8mm check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants