Closed
Description
Name and Version
I'm running my server like this, to test #12034
llama-server --jinja -fa -c 0 -hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF
Using various LLM frameworks in different languages, I couldn't get a successful tool call to complete. I've listed the errors, that vary, in the details
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Here's the version of llama-cpp
$ llama-cli --version
version: 4856 (6fefc05a)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
Problem description & steps to reproduce
I ran each tool calling example app in this directory catching where it errored at via socat -v TCP-LISTEN:8080,fork TCP:localhost:8081
, then I re-ran the corresponding curl to that failure.
Semantic Kernel dotnet: fails because tool_call.id is returned empty.
FYI this was First noticed here microsoft/semantic-kernel#10842
Here's the equiv request in curl:
curl -sX POST localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"temperature": 0,
"tools": [
{
"function": {
"description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
"name": "Elasticsearch-get_latest_version",
"strict": false,
"parameters": {
"type": "object",
"required": [],
"properties": {
"majorVersion": {
"description": "Major version to filter by (e.g. 7, 8). Defaults to latest",
"type": "integer"
}
}
}
},
"type": "function"
}
],
"messages": [
{
"role": "user",
"content": "What is the latest version of Elasticsearch 8?"
}
],
"model": "unused",
"tool_choice": "auto"
}'|jq .
{
"choices": [
{
"finish_reason": "tool_calls",
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "Elasticsearch-get_latest_version",
"arguments": "{\"majorVersion\":8}"
},
"id": ""
}
]
}
}
],
"created": 1741499613,
"model": "unused",
"system_fingerprint": "b4856-6fefc05a",
"object": "chat.completion",
"usage": {
"completion_tokens": 32,
"prompt_tokens": 206,
"total_tokens": 238
},
"id": "chatcmpl-d7mNPLF5fmLGgt7VQjyWuBxrScIKzAXY",
"timings": {
"prompt_n": 1,
"prompt_ms": 55.296,
"prompt_per_token_ms": 55.296,
"prompt_per_second": 18.08449074074074,
"predicted_n": 32,
"predicted_ms": 1107.194,
"predicted_per_token_ms": 34.5998125,
"predicted_per_second": 28.901890725563906
}
}
Spring AI: llama-server returns 500 failed to parse messages: Expected 'content'
Notes:
- This also fails the same way with pydantic-ai
- If you run via ramalama so that you can run
ollama://qwen2.5:3b
withllama-server
, it completes fine.
Here's the equiv request in curl:
$ curl -sX POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"messages": [
{
"content": "What is the latest version of Elasticsearch 8?",
"role": "user"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "",
"type": "function",
"function": {
"name": "getLatestElasticsearchVersion",
"arguments": "{\"majorVersion\":8}"
}
}
]
},
{
"content": "\"8.17.3\"",
"role": "tool",
"name": "getLatestElasticsearchVersion",
"tool_call_id": ""
}
],
"model": "unused",
"stream": false,
"temperature": 0.0,
"tools": [
{
"type": "function",
"function": {
"description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
"name": "getLatestElasticsearchVersion",
"parameters": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"additionalProperties": false,
"type": "object",
"properties": {
"majorVersion": {
"type": "integer",
"format": "int32",
"description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
}
},
"required": ["majorVersion"]
}
}
}
]
}'|jq .
{
"error": {
"code": 500,
"message": "Failed to parse messages: Expected 'content' (ref: https://github.com/ggml-org/llama.cpp/issues/8367); messages = [\n {\n \"content\": \"What is the latest version of Elasticsearch 8?\",\n \"role\": \"user\"\n },\n {\n \"role\": \"assistant\",\n \"tool_calls\": [\n {\n \"id\": \"\",\n \"type\": \"function\",\n \"function\": {\n \"name\": \"getLatestElasticsearchVersion\",\n \"arguments\": \"{\\\"majorVersion\\\":8}\"\n }\n }\n ]\n },\n {\n \"content\": \"\\\"8.17.3\\\"\",\n \"role\": \"tool\",\n \"name\": \"getLatestElasticsearchVersion\",\n \"tool_call_id\": \"\"\n }\n]",
"type": "server_error"
}
}
Vercel AI (node.js): returns choices[0].message.content
xml of the tool content instead of completing
The last message sent to the LLM is the result of the tool call, it should have completed the initial request, not reformat that same message as xml.
Notes:
- If you run via ramalama so that you can run
ollama://qwen2.5:3b
withllama-server
, it completes fine.
$ curl -sX POST localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unused",
"temperature": 0,
"messages": [
{
"role": "user",
"content": "What is the latest version of Elasticsearch 8?"
},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "",
"type": "function",
"function": {
"name": "getLatestElasticsearchVersion",
"arguments": "{\"majorVersion\":8}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "",
"content": "\"8.17.3\""
}
],
"tools": [
{
"type": "function",
"function": {
"name": "getLatestElasticsearchVersion",
"description": "Get the latest version of Elasticsearch",
"parameters": {
"type": "object",
"properties": {
"majorVersion": {
"type": "number",
"description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
}
},
"additionalProperties": false,
"$schema": "http://json-schema.org/draft-07/schema#"
}
}
}
],
"tool_choice": "auto"
}'|jq .
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "<tool_response>\n\"8.17.3\"\n</tool_response>"
}
}
],
"created": 1741500130,
"model": "unused",
"system_fingerprint": "b4856-6fefc05a",
"object": "chat.completion",
"usage": {
"completion_tokens": 17,
"prompt_tokens": 267,
"total_tokens": 284
},
"id": "chatcmpl-stqLFsYGVG2NBoW8c5gwNSQtDKQMVbQE",
"timings": {
"prompt_n": 64,
"prompt_ms": 222.386,
"prompt_per_token_ms": 3.47478125,
"prompt_per_second": 287.78790031746604,
"predicted_n": 17,
"predicted_ms": 565.839,
"predicted_per_token_ms": 33.28464705882353,
"predicted_per_second": 30.04388174021232
}
}
First Bad Commit
No response