Skip to content

BPE Tokenizer doesn't have model > vocab. #5180

Closed
@likejazz

Description

@likejazz

In a recent patch(https://github.com/ggerganov/llama.cpp/blame/d2f650cb5b04ee2726663e79b47da5efe196ce00/convert.py#L337), you imported the vocab list from self.bpe_tokenizer['model']['vocab'], which was originally taken from vocab.json file. However, the BPE Tokenizer's vocab.json file does not have a model > vocab. It does not contain any other metadata and consists only of a vocabulary list.

So, in my opinion, 337 line should be modified as follows:

self.vocab = self.bpe_tokenizer

I hope this helps. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions