Skip to content

Apple M1 metal lag #1730

Closed
Closed
@leedrake5

Description

@leedrake5

Prefacing that this isn't urgent. When using the recently added M1 GPU support, I see an odd behavior in system resource use. When using all threads -t 20, the first initialization follows the instruction. However when there is a pause in GPU use, only about 4 threads are used regardless of the tag.

Video showing response (Guanco 65B) and system resource use: https://youtu.be/ysA7xg6nevY

LLAMA_METAL=1 make -j && ./main -m ./models/guanaco-65B.ggmlv3.q4_0.bin -b 8000  -n 25600 -ngl 1 -t 20 --repeat-penalty 1.1764705882352942 --top-p 0 --top-k 40 --temp 0.7 --repeat-last-n 256 -p "How did the computer company Apple get its start and become successful?"

Apologies for the cringe prompt, but wanted to test accuracy (points for remembering Wayne was a founder, but Apple Watch was released in 2014, not 2015). Some parameters (batch size) are weird, but behavior is the same regardless of this integer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions