Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Running benchmark without a BLAS library should work
make -j benchmark-matmult
Current Behavior
since 2d5db48 it aborts with:
ABORT - ERROR in Matrix Multiplication result - expected 11611394048.00, got 11474052096.00 (delta 137341952.00 > allowed_delta 11611.39)
Environment and Context
I used git bisect
and make -j clean benchmark-matmult
, which pointed to
commit 2d5db48
Full run:
make -j benchmark-matmult
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (GCC) 13.1.1 20230429
I CXX: g++ (GCC) 13.1.1 20230429
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/benchmark/benchmark-matmult.cpp ggml.o -o benchmark-matmult
./benchmark-matmult
main: build = 567 (2d5db48)
Starting Test
Allocating Memory of size 794558464 bytes, 757 MB
Creating new tensors
------ Test 1 - Matrix Mult via F32 code ------------------------------------------------------------------------------
cgraph->n_threads=1
m11: type = 0 ( f32) ne = 11008 x 4096 x 1, nb = ( 4, 44032, 180355072) - Sum of tensor m11 is 16777216.00
m2: type = 0 ( f32) ne = 11008 x 128 x 1, nb = ( 4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
gf.nodes[0]: type = 0 ( f32) ne = 4096 x 128 x 1, nb = ( 4, 16384, 2097152) - Sum of tensor gf.nodes[0] is 11611394048.00
------ Test 2 - Matrix Mult via Q4_0 code ------------------------------------------------------------------------------
cgraph->n_threads=1
Matrix Multiplication of (11008,4096,1) x (11008,128,1) - about 11.54 gFLOPS
Iteration;NThreads; SizeX; SizeY; SizeZ; Required_FLOPS; Elapsed_u_Seconds; gigaFLOPS
=====================================================================================
0; 1; 11008; 4096; 128; 11542724608; 273886; 42.14
ABORT - ERROR in Matrix Multiplication result - expected 11611394048.00, got 11474052096.00 (delta 137341952.00 > allowed_delta 11611.39)
- System: Arch Linux on a Thinkpad L14 (AMD)