Skip to content

llama : improve sep token handling #14272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2025
Merged

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Jun 19, 2025

Sequence classification models (classifiers/rerankers) require a different pattern of special tokens than other models, so far this has been hard-coded to the most common pattern, but as we are getting support for more models this is starting to fail.

The way AutoTokenizer handles this is by using a special TemplateProcessor in tokenizer.json. This PR does not add full support for this as that would require major changes to the llama.cpp tokenizer, but instead simply adds a new add_sep_token state that can be used together with the existing add_bos_token and add_sep_token to figure out what tokens to add.

SpecialVocab will now crudely parse the tokenizer.config processor to figure out which of these to add to metadata, then llama-server and llama-embedding will add the appropriate tokens (however, still in the fixed pattern as before, so if we ever add support for a model with an unusual pattern, we will most likely have to rework things).

Added a new --cls-separator parameter for llama-embedding to automatically split sequence pairs and insert the correct tokens.

@CISC CISC requested a review from ngxson as a code owner June 19, 2025 00:43
@CISC CISC requested review from ggerganov and removed request for ngxson June 19, 2025 00:43
@github-actions github-actions bot added examples python python script changes server labels Jun 19, 2025
@CISC CISC requested a review from compilade June 19, 2025 00:43
@CISC CISC linked an issue Jun 19, 2025 that may be closed by this pull request
4 tasks
@github-actions github-actions bot added the devops improvements to build systems and github actions label Jun 19, 2025
@CISC CISC merged commit 88fc854 into master Jun 20, 2025
9 checks passed
@CISC CISC deleted the cisc/improved-sep-token-handling branch June 20, 2025 12:04
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 20, 2025
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jun 20, 2025
* mamba2-sync: (24 commits)
sync : ggml
Add `ggml_roll` (ggml/1274)
docs : fix the link to llama.h (ggml-org#14293)
CUDA: add conv_2d_transpose (ggml-org#14287)
lint : remove trailing whitepace (ggml-org#14304)
vocab : prevent tokenizer overflow (ggml-org#14301)
sycl: add usage of enqueue_functions extension (ggml-org#14244)
Implement GGML_CPU_ALL_VARIANTS for PowerPC (ggml-org#14286)
llama : improve sep token handling (ggml-org#14272)
cuda : synchronize graph capture and cublas handle destruction (ggml-org#14288)
ggml : fix repack work size for mul_mat_id (ggml-org#14292)
ggml: Update KleidiAI to v1.9.0 (ggml-org#14277)
model : more uniform output id handling (ggml-org#14275)
ubatch : new splitting logic (ggml-org#14217)
CUDA: add conv_2d_dw (ggml-org#14265)
ggml-cpu : remove unnecesary arm feature detection (ggml-org#14281)
gguf-py : make sentencepiece optional (ggml-org#14200)
server : add server parameters for draft model cache type (ggml-org#13782)
build : suppress gcc15 compile warnings (ggml-org#14261)
sycl: Cleanup codepaths in Get Rows in sycl backend (ggml-org#14215)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
3 participants