Skip to content

Misc. bug: tool call issues with hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF #12279

Closed
@codefromthecrypt

Description

@codefromthecrypt

Name and Version

I'm running my server like this, to test #12034

llama-server --jinja -fa -c 0 -hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF

Using various LLM frameworks in different languages, I couldn't get a successful tool call to complete. I've listed the errors, that vary, in the details

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

Here's the version of llama-cpp

$ llama-cli --version
version: 4856 (6fefc05a)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

Problem description & steps to reproduce

I ran each tool calling example app in this directory catching where it errored at via socat -v TCP-LISTEN:8080,fork TCP:localhost:8081, then I re-ran the corresponding curl to that failure.

Semantic Kernel dotnet: fails because tool_call.id is returned empty.

FYI this was First noticed here microsoft/semantic-kernel#10842

Here's the equiv request in curl:

curl -sX POST localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
  "temperature": 0,
  "tools": [
    {
      "function": {
        "description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
        "name": "Elasticsearch-get_latest_version",
        "strict": false,
        "parameters": {
          "type": "object",
          "required": [],
          "properties": {
            "majorVersion": {
              "description": "Major version to filter by (e.g. 7, 8). Defaults to latest",
              "type": "integer"
            }
          }
        }
      },
      "type": "function"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What is the latest version of Elasticsearch 8?"
    }
  ],
  "model": "unused",
  "tool_choice": "auto"
}'|jq .
{
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "Elasticsearch-get_latest_version",
              "arguments": "{\"majorVersion\":8}"
            },
            "id": ""
          }
        ]
      }
    }
  ],
  "created": 1741499613,
  "model": "unused",
  "system_fingerprint": "b4856-6fefc05a",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 32,
    "prompt_tokens": 206,
    "total_tokens": 238
  },
  "id": "chatcmpl-d7mNPLF5fmLGgt7VQjyWuBxrScIKzAXY",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 55.296,
    "prompt_per_token_ms": 55.296,
    "prompt_per_second": 18.08449074074074,
    "predicted_n": 32,
    "predicted_ms": 1107.194,
    "predicted_per_token_ms": 34.5998125,
    "predicted_per_second": 28.901890725563906
  }
}

Spring AI: llama-server returns 500 failed to parse messages: Expected 'content'

Notes:

  • This also fails the same way with pydantic-ai
  • If you run via ramalama so that you can run ollama://qwen2.5:3b with llama-server, it completes fine.

Here's the equiv request in curl:

$ curl -sX POST http://localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "messages": [
      {
        "content": "What is the latest version of Elasticsearch 8?",
        "role": "user"
      },
      {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "",
            "type": "function",
            "function": {
              "name": "getLatestElasticsearchVersion",
              "arguments": "{\"majorVersion\":8}"
            }
          }
        ]
      },
      {
        "content": "\"8.17.3\"",
        "role": "tool",
        "name": "getLatestElasticsearchVersion",
        "tool_call_id": ""
      }
    ],
    "model": "unused",
    "stream": false,
    "temperature": 0.0,
    "tools": [
      {
        "type": "function",
        "function": {
          "description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
          "name": "getLatestElasticsearchVersion",
          "parameters": {
            "$schema": "https://json-schema.org/draft/2020-12/schema",
            "additionalProperties": false,
            "type": "object",
            "properties": {
              "majorVersion": {
                "type": "integer",
                "format": "int32",
                "description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
              }
            },
            "required": ["majorVersion"]
          }
        }
      }
    ]
  }'|jq .
{
  "error": {
    "code": 500,
    "message": "Failed to parse messages: Expected 'content' (ref: https://github.com/ggml-org/llama.cpp/issues/8367); messages = [\n  {\n    \"content\": \"What is the latest version of Elasticsearch 8?\",\n    \"role\": \"user\"\n  },\n  {\n    \"role\": \"assistant\",\n    \"tool_calls\": [\n      {\n        \"id\": \"\",\n        \"type\": \"function\",\n        \"function\": {\n          \"name\": \"getLatestElasticsearchVersion\",\n          \"arguments\": \"{\\\"majorVersion\\\":8}\"\n        }\n      }\n    ]\n  },\n  {\n    \"content\": \"\\\"8.17.3\\\"\",\n    \"role\": \"tool\",\n    \"name\": \"getLatestElasticsearchVersion\",\n    \"tool_call_id\": \"\"\n  }\n]",
    "type": "server_error"
  }
}

Vercel AI (node.js): returns choices[0].message.content xml of the tool content instead of completing

The last message sent to the LLM is the result of the tool call, it should have completed the initial request, not reformat that same message as xml.

Notes:

  • If you run via ramalama so that you can run ollama://qwen2.5:3b with llama-server, it completes fine.
$ curl -sX POST localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "unused",
  "temperature": 0,
  "messages": [
    {
      "role": "user",
      "content": "What is the latest version of Elasticsearch 8?"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "",
          "type": "function",
          "function": {
            "name": "getLatestElasticsearchVersion",
            "arguments": "{\"majorVersion\":8}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "",
      "content": "\"8.17.3\""
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "getLatestElasticsearchVersion",
        "description": "Get the latest version of Elasticsearch",
        "parameters": {
          "type": "object",
          "properties": {
            "majorVersion": {
              "type": "number",
              "description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
            }
          },
          "additionalProperties": false,
          "$schema": "http://json-schema.org/draft-07/schema#"
        }
      }
    }
  ],
  "tool_choice": "auto"
}'|jq .
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<tool_response>\n\"8.17.3\"\n</tool_response>"
      }
    }
  ],
  "created": 1741500130,
  "model": "unused",
  "system_fingerprint": "b4856-6fefc05a",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 17,
    "prompt_tokens": 267,
    "total_tokens": 284
  },
  "id": "chatcmpl-stqLFsYGVG2NBoW8c5gwNSQtDKQMVbQE",
  "timings": {
    "prompt_n": 64,
    "prompt_ms": 222.386,
    "prompt_per_token_ms": 3.47478125,
    "prompt_per_second": 287.78790031746604,
    "predicted_n": 17,
    "predicted_ms": 565.839,
    "predicted_per_token_ms": 33.28464705882353,
    "predicted_per_second": 30.04388174021232
  }
}

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions