Feature Request: Ability to pack multiple GGUFs into single one

### Feature Description

From an idea brought up by @ggerganov in this discussion: https://github.com/ggml-org/llama.cpp/discussions/11139#discussioncomment-11783418

While it is **NOT** a good idea to pack both mmproj + text models (because vision support is still messy atm), we still have some interesting use cases:

- For TTS models, this can be useful because some models may requires more than 2 GGUFs to run (for ex. Sesame CSM requires backbone, decoder and Mimi models)
- For phi-4-mm model, while the mmproj can't be packed, it is still interesting to pack the LoRA adapters and the text model together
- There are some techniques which use LoRA to recover quality loss due to quantization, it can be useful to pack LoRA with the model (though, I don't know how effective this can be, cc @compilade )
- Some models having more than 1 modality (i.e.Phi-4-mm with both audio+vision input), so could be useful to pack audio encoder and vision encoder into single GGUF

### Motivation

I create this issue to discuss about possible implementation

### Possible Implementation

An implementation could be to have "namespace" for KV metadata and tensor name, then have a "super" key for the list of namespaces

For example, with the case of Sesame CSM, given 2 GGUFs: backbone and decoder, the routine to pack these 2 GGUFs is as follow:
- We create a blank GGUF
- Add metadata `general.namespaces = ["backbone", "decoder"]`
- Copy all metadata + tensors from backbone while adding `backbone.` prefix to the key name
- Copy all metadata + tensors from decoder while adding `decoder.` prefix to the key name

These APIs will need to be added into `libllama`:
- `int32_t llama_model_n_namespaces(llama_model * model)`: returns the number of namespaces, 0 meaning no namespace
- `const char ** llama_model_list_namespaces(llama_model * model)`: returns the list of namespace as strings
- `llama_model * llama_model_get_namespace(int idx)`: returns the sub `llama_model *` object corresponding to a namespace index

### Problems

1. For existing models (like TTS), how to we make a smooth transition to the new packed format? Or probably accept breaking changes since not many people are using it anyway?
2. How can we design the API such that it implies the least change to user code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Ability to pack multiple GGUFs into single one #13028

Feature Description

Motivation

Possible Implementation

Problems

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Ability to pack multiple GGUFs into single one #13028

Description

Feature Description

Motivation

Possible Implementation

Problems

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions