No special token handling in imatrix, beam-search and others

Some models are extremely sensitive to the prompt format being correct. Without it they generate gibberish.
beam-search calls llama_tokenize with parse_special = false. Once I switched that to true the special tokens in my prompts were parsed correctly and it would generate reasonable output.

It is also set to false in the imatrix generation. Thus, all sample data generated from common chat and instruction datasets in the prompt format of the model will not be tokenized in the same way that the model will see during regular inference. Shouldn't it have an impact that zero real prompt formats were evaluated for the imatrix generation?

To get a better idea of the impact I've tested this with the perplexity measurement which also does not parse special tokens. In my quick ChatML test with CodeQwen-1.5 the perplexity went _up_ by 40% once special tokens were parsed. Maybe that's due to the raw chunking that evaluates multiple prompts at once and also breaks them in the middle?
Side note: The tokenization took 500x longer with parse_special = true.

Maybe that's something to be investigated why the PPL went up when special tokens were enabled, and if special token parsing could improve imatrix results? A reason why it might be disabled is stated [here](https://github.com/ggerganov/llama.cpp/pull/4160#issuecomment-1824826216).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No special token handling in imatrix, beam-search and others #6804

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No special token handling in imatrix, beam-search and others #6804

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions