Converting alpaca-native-GPTQ models into ggml models

# Expected Behavior

Hello, 

I wanted to convert the alpaca-native 7b GPTQ file (pt file) into a ggml file with the convert-gptq-to-ggml.py script https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py

# Current Behavior

The problem is that I have this error 

```
D:\Large Language Models\CONVERTISSEURS\gptq to ggml>python convert-gptq-to-ggml.py alpaca-native-4b
it.pt tokenizer.model out.bin
32000
32001
Traceback (most recent call last):
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\convert-gptq-to-ggml.py", line 35, in <
module>
    assert tokenizer.vocab_size() == n_vocab
AssertionError
```
32000 is the tokenizer.vocab_size() (Number of tokens on the tokenizer.model)
32001 is the n_vocab (Number of tokens on the model)

The model that is trained with alpaca has 1 more token and it's this one:
"[PAD]": 32000

It looks like that if we want to convert the alpaca native GPTQ models we need to create a new tokenizer.model that has this "PAD" token in it.

The problem is that I have no idea how to do that... if someone can help me on this I'll appreciate!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Converting alpaca-native-GPTQ models into ggml models #442

Expected Behavior

Current Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Converting alpaca-native-GPTQ models into ggml models #442

Description

Expected Behavior

Current Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions