Multi-part GGML files: do they still work? And how hard would it be to modify convert.py to create them?

Hi all

Hugging Face has a max file size limit of 50GB, which is a bit annoying.  This means it's not possible to upload a q8_0 GGML of a 65B model, or a float16 GGML for a 30B model.

I've had two people ask me to upload q8_0's for my 65B uploads. One of them asked if I could use another file sharing site like Google Drive or something like that. But the other mentioned the possibility of multi-part GGMLs.

I know that llama.cpp used to support multi-part models? It still shows `n_parts 1` in the header, implying that it might support 2 parts as well?

So I'd love to know:
1. Does llama.cpp still support multi-part GGMLs?
2. And if so, should it be fairly straightforward to modify convert.py to create one?

Here's the method convert.py uses to write the GGML file:
```python
    @staticmethod
    def write_all(fname_out: Path, params: Params, model: LazyModel, vocab: Vocab) -> None:
        check_vocab_size(params, vocab)
        of = OutputFile(fname_out)
        of.write_file_header(params)
        print("Writing vocab...")
        of.write_vocab(vocab)

        def do_item(item: Tuple[str, LazyTensor]) -> NDArray:
            name, lazy_tensor = item
            return lazy_tensor.load().to_ggml().ndarray

        ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=8)
        for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
            size = ' x '.join(f"{dim:6d}" for dim in lazy_tensor.shape)
            padi = len(str(len(model)))
            print(f"[{i+1:{padi}d}/{len(model)}] Writing tensor {name:38s} | size {size:16} | type {lazy_tensor.data_type}")
            of.write_tensor_header(name, lazy_tensor.shape, lazy_tensor.data_type)
            ndarray.tofile(of.fout)
        of.fout.close()
```

Would it just be a case of writing the file header twice, and then just putting the first X layers in the first file, and the rest in the other? 

What about the vocab - would that go in both files, or only in the first?

Thanks in advance for any info!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-part GGML files: do they still work? And how hard would it be to modify convert.py to create them? #1503

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-part GGML files: do they still work? And how hard would it be to modify convert.py to create them? #1503

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions