Closed
Description
Expected Behavior
Hello,
I wanted to convert the alpaca-native 7b GPTQ file (pt file) into a ggml file with the convert-gptq-to-ggml.py script https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py
Current Behavior
The problem is that I have this error
D:\Large Language Models\CONVERTISSEURS\gptq to ggml>python convert-gptq-to-ggml.py alpaca-native-4b
it.pt tokenizer.model out.bin
32000
32001
Traceback (most recent call last):
File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\convert-gptq-to-ggml.py", line 35, in <
module>
assert tokenizer.vocab_size() == n_vocab
AssertionError
32000 is the tokenizer.vocab_size() (Number of tokens on the tokenizer.model)
32001 is the n_vocab (Number of tokens on the model)
The model that is trained with alpaca has 1 more token and it's this one:
"[PAD]": 32000
It looks like that if we want to convert the alpaca native GPTQ models we need to create a new tokenizer.model that has this "PAD" token in it.
The problem is that I have no idea how to do that... if someone can help me on this I'll appreciate!