Misc. bug: tool call issues with hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF

### Name and Version

I'm running my server like this, to test #12034 
```bash
llama-server --jinja -fa -c 0 -hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF
```

Using various LLM frameworks in different languages, I couldn't get a successful tool call to complete. I've listed the errors, that vary, in the details


### Operating systems

_No response_

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
Here's the version of llama-cpp

$ llama-cli --version
version: 4856 (6fefc05a)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
```

### Problem description & steps to reproduce

I ran each [tool calling example app](https://github.com/elastic/observability-examples/tree/main/genai-function-calling) in this directory catching where it errored at via `socat -v TCP-LISTEN:8080,fork TCP:localhost:8081`, then I re-ran the corresponding curl to that failure.


### Semantic Kernel dotnet: fails because tool_call.id is returned empty.

FYI this was First noticed here https://github.com/microsoft/semantic-kernel/issues/10842

Here's the equiv request in curl:
```bash
curl -sX POST localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
  "temperature": 0,
  "tools": [
    {
      "function": {
        "description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
        "name": "Elasticsearch-get_latest_version",
        "strict": false,
        "parameters": {
          "type": "object",
          "required": [],
          "properties": {
            "majorVersion": {
              "description": "Major version to filter by (e.g. 7, 8). Defaults to latest",
              "type": "integer"
            }
          }
        }
      },
      "type": "function"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What is the latest version of Elasticsearch 8?"
    }
  ],
  "model": "unused",
  "tool_choice": "auto"
}'|jq .
{
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "Elasticsearch-get_latest_version",
              "arguments": "{\"majorVersion\":8}"
            },
            "id": ""
          }
        ]
      }
    }
  ],
  "created": 1741499613,
  "model": "unused",
  "system_fingerprint": "b4856-6fefc05a",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 32,
    "prompt_tokens": 206,
    "total_tokens": 238
  },
  "id": "chatcmpl-d7mNPLF5fmLGgt7VQjyWuBxrScIKzAXY",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 55.296,
    "prompt_per_token_ms": 55.296,
    "prompt_per_second": 18.08449074074074,
    "predicted_n": 32,
    "predicted_ms": 1107.194,
    "predicted_per_token_ms": 34.5998125,
    "predicted_per_second": 28.901890725563906
  }
}
```

###  Spring AI: llama-server returns 500 `failed to parse messages: Expected 'content'`

Notes:
* This also fails the same way with pydantic-ai
* If you run via ramalama so that you can run `ollama://qwen2.5:3b` with `llama-server`, it completes fine.

Here's the equiv request in curl:
```bash
$ curl -sX POST http://localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "messages": [
      {
        "content": "What is the latest version of Elasticsearch 8?",
        "role": "user"
      },
      {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "",
            "type": "function",
            "function": {
              "name": "getLatestElasticsearchVersion",
              "arguments": "{\"majorVersion\":8}"
            }
          }
        ]
      },
      {
        "content": "\"8.17.3\"",
        "role": "tool",
        "name": "getLatestElasticsearchVersion",
        "tool_call_id": ""
      }
    ],
    "model": "unused",
    "stream": false,
    "temperature": 0.0,
    "tools": [
      {
        "type": "function",
        "function": {
          "description": "Returns the latest GA version of Elasticsearch in \"X.Y.Z\" format.",
          "name": "getLatestElasticsearchVersion",
          "parameters": {
            "$schema": "https://json-schema.org/draft/2020-12/schema",
            "additionalProperties": false,
            "type": "object",
            "properties": {
              "majorVersion": {
                "type": "integer",
                "format": "int32",
                "description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
              }
            },
            "required": ["majorVersion"]
          }
        }
      }
    ]
  }'|jq .
{
  "error": {
    "code": 500,
    "message": "Failed to parse messages: Expected 'content' (ref: https://github.com/ggml-org/llama.cpp/issues/8367); messages = [\n  {\n    \"content\": \"What is the latest version of Elasticsearch 8?\",\n    \"role\": \"user\"\n  },\n  {\n    \"role\": \"assistant\",\n    \"tool_calls\": [\n      {\n        \"id\": \"\",\n        \"type\": \"function\",\n        \"function\": {\n          \"name\": \"getLatestElasticsearchVersion\",\n          \"arguments\": \"{\\\"majorVersion\\\":8}\"\n        }\n      }\n    ]\n  },\n  {\n    \"content\": \"\\\"8.17.3\\\"\",\n    \"role\": \"tool\",\n    \"name\": \"getLatestElasticsearchVersion\",\n    \"tool_call_id\": \"\"\n  }\n]",
    "type": "server_error"
  }
}

```

### Vercel AI (node.js):  returns `choices[0].message.content` xml of the tool content instead of completing

The last message sent to the LLM is the result of the tool call, it should have completed the initial request, not reformat that same message as xml.

Notes:
* If you run via ramalama so that you can run `ollama://qwen2.5:3b` with `llama-server`, it completes fine.

```bash
$ curl -sX POST localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
  "model": "unused",
  "temperature": 0,
  "messages": [
    {
      "role": "user",
      "content": "What is the latest version of Elasticsearch 8?"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "",
          "type": "function",
          "function": {
            "name": "getLatestElasticsearchVersion",
            "arguments": "{\"majorVersion\":8}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "",
      "content": "\"8.17.3\""
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "getLatestElasticsearchVersion",
        "description": "Get the latest version of Elasticsearch",
        "parameters": {
          "type": "object",
          "properties": {
            "majorVersion": {
              "type": "number",
              "description": "Major version to filter by (e.g. 7, 8). Defaults to latest"
            }
          },
          "additionalProperties": false,
          "$schema": "http://json-schema.org/draft-07/schema#"
        }
      }
    }
  ],
  "tool_choice": "auto"
}'|jq .
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<tool_response>\n\"8.17.3\"\n</tool_response>"
      }
    }
  ],
  "created": 1741500130,
  "model": "unused",
  "system_fingerprint": "b4856-6fefc05a",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 17,
    "prompt_tokens": 267,
    "total_tokens": 284
  },
  "id": "chatcmpl-stqLFsYGVG2NBoW8c5gwNSQtDKQMVbQE",
  "timings": {
    "prompt_n": 64,
    "prompt_ms": 222.386,
    "prompt_per_token_ms": 3.47478125,
    "prompt_per_second": 287.78790031746604,
    "predicted_n": 17,
    "predicted_ms": 565.839,
    "predicted_per_token_ms": 33.28464705882353,
    "predicted_per_second": 30.04388174021232
  }
}
```

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: tool call issues with hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF #12279

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Semantic Kernel dotnet: fails because tool_call.id is returned empty.

Spring AI: llama-server returns 500 `failed to parse messages: Expected 'content'`

Vercel AI (node.js): returns `choices[0].message.content` xml of the tool content instead of completing

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: tool call issues with hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF #12279

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Semantic Kernel dotnet: fails because tool_call.id is returned empty.

Spring AI: llama-server returns 500 failed to parse messages: Expected 'content'

Vercel AI (node.js): returns choices[0].message.content xml of the tool content instead of completing

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Spring AI: llama-server returns 500 `failed to parse messages: Expected 'content'`

Vercel AI (node.js): returns `choices[0].message.content` xml of the tool content instead of completing