Closed
Description
In a recent patch(https://github.com/ggerganov/llama.cpp/blame/d2f650cb5b04ee2726663e79b47da5efe196ce00/convert.py#L337), you imported the vocab list from self.bpe_tokenizer['model']['vocab']
, which was originally taken from vocab.json
file. However, the BPE Tokenizer's vocab.json
file does not have a model > vocab
. It does not contain any other metadata and consists only of a vocabulary list.
So, in my opinion, 337 line should be modified as follows:
self.vocab = self.bpe_tokenizer
I hope this helps. Thanks.
Metadata
Metadata
Assignees
Labels
No labels