llama : improve sep token handling #14272

CISC · 2025-06-19T00:43:02Z

Sequence classification models (classifiers/rerankers) require a different pattern of special tokens than other models, so far this has been hard-coded to the most common pattern, but as we are getting support for more models this is starting to fail.

The way AutoTokenizer handles this is by using a special TemplateProcessor in tokenizer.json. This PR does not add full support for this as that would require major changes to the llama.cpp tokenizer, but instead simply adds a new add_sep_token state that can be used together with the existing add_bos_token and add_sep_token to figure out what tokens to add.

SpecialVocab will now crudely parse the tokenizer.config processor to figure out which of these to add to metadata, then llama-server and llama-embedding will add the appropriate tokens (however, still in the fixed pattern as before, so if we ever add support for a model with an unusual pattern, we will most likely have to rework things).

Added a new --cls-separator parameter for llama-embedding to automatically split sequence pairs and insert the correct tokens.

examples/embedding/embedding.cpp

ggml-ci

gguf-py/gguf/vocab.py

ggml-ci

* mamba2-sync: (24 commits) sync : ggml Add `ggml_roll` (ggml/1274) docs : fix the link to llama.h (ggml-org#14293) CUDA: add conv_2d_transpose (ggml-org#14287) lint : remove trailing whitepace (ggml-org#14304) vocab : prevent tokenizer overflow (ggml-org#14301) sycl: add usage of enqueue_functions extension (ggml-org#14244) Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286) llama : improve sep token handling (ggml-org#14272) cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288) ggml : fix repack work size for mul_mat_id (ggml-org#14292) ggml: Update KleidiAI to v1.9.0 (ggml-org#14277) model : more uniform output id handling (ggml-org#14275) ubatch : new splitting logic (ggml-org#14217) CUDA: add conv_2d_dw (ggml-org#14265) ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281) gguf-py : make sentencepiece optional (ggml-org#14200) server : add server parameters for draft model cache type (ggml-org#13782) build : suppress gcc15 compile warnings (ggml-org#14261) sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215) ...

improve sep token handling

30d4cf3

CISC requested a review from ngxson as a code owner June 19, 2025 00:43

CISC requested review from ggerganov and removed request for ngxson June 19, 2025 00:43

github-actions bot added examples python python script changes server labels Jun 19, 2025

CISC requested a review from compilade June 19, 2025 00:43

CISC linked an issue Jun 19, 2025 that may be closed by this pull request

Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token #14252

Closed

4 tasks

ggerganov reviewed Jun 19, 2025

View reviewed changes

examples/embedding/embedding.cpp Outdated Show resolved Hide resolved

CISC added 2 commits June 19, 2025 10:22

rename variable for clarity [no ci]

fdc309e

ggml-ci

update rerank test [no ci]

3350e4a

ggml-ci

github-actions bot added the devops improvements to build systems and github actions label Jun 19, 2025

ggerganov approved these changes Jun 19, 2025

View reviewed changes

compilade reviewed Jun 20, 2025

View reviewed changes

gguf-py/gguf/vocab.py Show resolved Hide resolved

CISC added 3 commits June 20, 2025 10:33

add warnings

f172a27

ggml-ci

set eos to sep if missing

a854897

ggml-ci

set bos to cls if missing [no ci]

d7f340b

ggml-ci

CISC merged commit 88fc854 into master Jun 20, 2025
9 checks passed

CISC deleted the cisc/improved-sep-token-handling branch June 20, 2025 12:04

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 20, 2025

llama : improve sep token handling (ggml-org#14272)

3fedcaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : improve sep token handling #14272

llama : improve sep token handling #14272

Uh oh!

CISC commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llama : improve sep token handling #14272

llama : improve sep token handling #14272

Uh oh!

Conversation

CISC commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC commented Jun 19, 2025 •

edited

Loading