Skip to content

Re-quantization of a split gguf file produces "invalid split file" #6548

Closed
@he29-net

Description

@he29-net

Hi, while testing #6491 branch, I downloaded a Q8_0 quant (split into 3 files) from dranger003, and re-quantized it to Q2_K_S to make it more digestible for my museum hardware:

./quantize --allow-requantize --imatrix ../models/ggml-c4ai-command-r-plus-104b-f16-imatrix.dat ../models/ggml-c4ai-command-r-plus-104b-q8_0-00001-of-00003.gguf ../models/command-r-plus-104b-Q2_K_S.gguf Q2_K_S 2

I only passed the first piece, but ./quantize processed it correctly and produced a single file with the expected size. However, it probably did not update some metadata and ./main still thinks the result is a split file:

./main -m ../models/command-r-plus-104b-Q2_K_S.gguf -t 15 --color -p "this is a test" -c 2048 -ngl 25 -ctk q8_0
...
llama_model_load: error loading model: invalid split file: ../models/command-r-plus-104b-Q2_K_S.gguf
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../models/command-r-plus-104b-Q2_K_S.gguf'
main: error: unable to load model

As a workaround, it is possible to "reset" the metadata by doing a "dummy pass" of gguf-split:

./gguf-split --split-max-tensors 999 --split ../models/command-r-plus-104b-Q2_K_S.gguf ../models/command-r-plus-104b-Q2_K_S.gguf.split

The resulting file then seems to be working fine.

It's probably an easy fix, but after a quick grep through the source and a look at quantize.cpp I figured I don't even know where to start, so it would be probably much easier and faster done by someone who knows the code-base.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomerssplitGGUF split model sharding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions