Skip to content

Misc. bug: Unsupported op "CPY" / Segmentation fault on Metal #10976

Closed
@firelex

Description

@firelex

Name and Version

version: 4391 (9ba399d)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.1.0

Operating systems

Mac (M4 Max / 128 GB)

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

./build/bin/llama-server -m /Users/mattsinalco/.cache/huggingface/hub/models--unsloth--Llama-3.3-70B-Instruct-GGUF/snapshots/0c14ebbedd129fb190c8241facca9a360e81c650/Llama-3.3-70B-Instruct-Q4_K_M.gguf -md /Users/mattsinalco/.cache/huggingface/hub/models--unsloth--Llama-3.2-1B-Instruct-GGUF/snapshots/a5594fb18df5dfc6b43281423fcce6750cd92de5/Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 99 -ngld 99 -fa --port 8034 --ctx-size 8192 --ctx-size-draft 8192 --draft-min 0 --draft-max 16 -np 7 --host 0.0.0.0 --slots --slot-save-path /Users/mattsinalco/mathias/caching -ctk q4_1 -ctv q4_1

Sometimes (reproducibly) gives me this:

/Users/mattsinalco/mathias/llama.cpp/ggml/src/ggml-metal/ggml-metal.m:1263: unsupported op
ggml_metal_encode_node: error: unsupported op 'CPY'

Other quantizations give me this:

zsh: segmentation fault ./build/bin/llama-server -m -md -ngl 99 -ngld 99 -fa --port 8034 --ctx-size

Related question - in the absence of quantization the KV cache workign reliabely, can I resize the KV cache size? I can't seem to load slots of 200MB (100MB is possible).

First Bad Commit

No response

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions