Closed
Description
Prefacing that this isn't urgent. When using the recently added M1 GPU support, I see an odd behavior in system resource use. When using all threads -t 20, the first initialization follows the instruction. However when there is a pause in GPU use, only about 4 threads are used regardless of the tag.
Video showing response (Guanco 65B) and system resource use: https://youtu.be/ysA7xg6nevY
LLAMA_METAL=1 make -j && ./main -m ./models/guanaco-65B.ggmlv3.q4_0.bin -b 8000 -n 25600 -ngl 1 -t 20 --repeat-penalty 1.1764705882352942 --top-p 0 --top-k 40 --temp 0.7 --repeat-last-n 256 -p "How did the computer company Apple get its start and become successful?"
Apologies for the cringe prompt, but wanted to test accuracy (points for remembering Wayne was a founder, but Apple Watch was released in 2014, not 2015). Some parameters (batch size) are weird, but behavior is the same regardless of this integer.
Metadata
Metadata
Assignees
Labels
No labels