Closed
Description
Related to: ggml-org/llama.cpp#6343 (comment)
Explanation: We recently introduced gguf-split
tool to llama.cpp, which allows user to split the model into smaller shards. Each shard has 3 metadata to know its info:
split.count
: Total number of splitssplit.no
: The number of the current splitsplit.tensors.count
: Total number of tensors of the original model (= sum of tensors of all splits)
The split.no
is however missing when viewing from GGUF viewer on huggingface. It is still visible when inspecting using gguf-py
This can be reproduce using a smaller model: https://huggingface.co/ngxson/tinyllama_split_test/tree/main?show_tensors=stories15M-q8_0-00001-of-00003.gguf
Here is the command that I used to split the model:
./gguf-split --split-max-size 10M ~/Downloads/stories15M-q8_0.gguf ~/Downloads/stories15M-q8_0
I'd be happy to help you guys with this. Feel free to let me know if you need more info.