BPE Tokenizer doesn't have `model > vocab`.

In a recent patch(https://github.com/ggerganov/llama.cpp/blame/d2f650cb5b04ee2726663e79b47da5efe196ce00/convert.py#L337), you imported the vocab list from `self.bpe_tokenizer['model']['vocab']`, which was originally taken from `vocab.json` file. However, the BPE Tokenizer's `vocab.json` file does not have a `model > vocab`. It does not contain any other metadata and consists only of a vocabulary list.

So, in my opinion, [337](https://github.com/ggerganov/llama.cpp/blame/d2f650cb5b04ee2726663e79b47da5efe196ce00/convert.py#L337) line should be modified as follows:

```
self.vocab = self.bpe_tokenizer
```
I hope this helps. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BPE Tokenizer doesn't have `model > vocab`. #5180

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BPE Tokenizer doesn't have model > vocab. #5180

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

BPE Tokenizer doesn't have `model > vocab`. #5180