-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ggml-cpu: cmake add arm64 cpu feature check for macos #10487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ggml/src/ggml-cpu/CMakeLists.txt
Outdated
add_compile_definitions(__ARM_FEATURE_DOTPROD) | ||
endif () | ||
|
||
check_cxx_source_compiles("#include <arm_neon.h>\nint main() { int8x16_t _a, _b; int32x4_t _s = vmlaq_f32(_s, _a, _b); return 0; }" GGML_COMPILER_SUPPORT_MATMUL_INT8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come the presence of vmlaq_f32
confirms __ARM_FEATURE_MATMUL_INT8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your question. The current check using vmlaq_f32 to confirm __ARM_FEATURE_MATMUL_INT8 is incorrect because vmlaq_f32 operates on floating-point values (float32x4_t), while __ARM_FEATURE_MATMUL_INT8 is specifically related to integer matrix multiplication using INT8 (8-bit signed integers).
I'll update the patch using an operation that deals with INT8 data types, such as vmlaq_s32.
This change breaks make -j && ./bin/test-backend-ops -o MUL_MAT -b CPU On M1 Pro it crashes like this:
On M2 Ultra, the tests fail:
|
The problem is even though the sources compile successfully, they do not run: g++ -march=armv8.2a+i8mm test.cpp && ./a.out
Illegal instruction: 4 |
I was able to reproduce the failing cases on an M3. The issue appears to be related to the test test-backend-ops, which is a special case that doesn't require model loading. As a result, the Q4_0 optimized kernel was not activated despite the hardware supporting i8mm. Meanwhile the macro GGML_USE_LLAMAFILE was undefined as result of __ARM_FEATURE_MATMUL_INT8 definition. This mismatch caused the kernel selection logic to skip both the optimized path for Q4_0 and GGML_USE_LLAMAFILE path. I applied a temporary patch to prevent GGML_USE_LLAMAFILE from being undefined and
|
The test-backend-ops didn't seem use I8MM optimized kernel in my test on a M3 even though __ARM_FEATURE_MATMUL_INT8 was enabled. |
|
* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check
Add fixe for #10435