Skip to content

Multi-part GGML files: do they still work? And how hard would it be to modify convert.py to create them? #1503

Closed
@TheBloke

Description

@TheBloke

Hi all

Hugging Face has a max file size limit of 50GB, which is a bit annoying. This means it's not possible to upload a q8_0 GGML of a 65B model, or a float16 GGML for a 30B model.

I've had two people ask me to upload q8_0's for my 65B uploads. One of them asked if I could use another file sharing site like Google Drive or something like that. But the other mentioned the possibility of multi-part GGMLs.

I know that llama.cpp used to support multi-part models? It still shows n_parts 1 in the header, implying that it might support 2 parts as well?

So I'd love to know:

  1. Does llama.cpp still support multi-part GGMLs?
  2. And if so, should it be fairly straightforward to modify convert.py to create one?

Here's the method convert.py uses to write the GGML file:

    @staticmethod
    def write_all(fname_out: Path, params: Params, model: LazyModel, vocab: Vocab) -> None:
        check_vocab_size(params, vocab)
        of = OutputFile(fname_out)
        of.write_file_header(params)
        print("Writing vocab...")
        of.write_vocab(vocab)

        def do_item(item: Tuple[str, LazyTensor]) -> NDArray:
            name, lazy_tensor = item
            return lazy_tensor.load().to_ggml().ndarray

        ndarrays = bounded_parallel_map(do_item, model.items(), concurrency=8)
        for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
            size = ' x '.join(f"{dim:6d}" for dim in lazy_tensor.shape)
            padi = len(str(len(model)))
            print(f"[{i+1:{padi}d}/{len(model)}] Writing tensor {name:38s} | size {size:16} | type {lazy_tensor.data_type}")
            of.write_tensor_header(name, lazy_tensor.shape, lazy_tensor.data_type)
            ndarray.tofile(of.fout)
        of.fout.close()

Would it just be a case of writing the file header twice, and then just putting the first X layers in the first file, and the rest in the other?

What about the vocab - would that go in both files, or only in the first?

Thanks in advance for any info!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions