Skip to content

Converting alpaca-native-GPTQ models into ggml models #442

Closed
@BadisG

Description

@BadisG

Expected Behavior

Hello,

I wanted to convert the alpaca-native 7b GPTQ file (pt file) into a ggml file with the convert-gptq-to-ggml.py script https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py

Current Behavior

The problem is that I have this error

D:\Large Language Models\CONVERTISSEURS\gptq to ggml>python convert-gptq-to-ggml.py alpaca-native-4b
it.pt tokenizer.model out.bin
32000
32001
Traceback (most recent call last):
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\convert-gptq-to-ggml.py", line 35, in <
module>
    assert tokenizer.vocab_size() == n_vocab
AssertionError

32000 is the tokenizer.vocab_size() (Number of tokens on the tokenizer.model)
32001 is the n_vocab (Number of tokens on the model)

The model that is trained with alpaca has 1 more token and it's this one:
"[PAD]": 32000

It looks like that if we want to convert the alpaca native GPTQ models we need to create a new tokenizer.model that has this "PAD" token in it.

The problem is that I have no idea how to do that... if someone can help me on this I'll appreciate!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmodelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions