Bug: llama.cpp server reports inaccurate n_ctx_per_seq?

### What happened?

Running a model and specifying 8192 context like so:
```
/llama-server --model Mistral-Large-Instruct-2407-IQ3_XXS.gguf -c 8192 -ngl 35
```
Causes the following to print during initialization:
```
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be
utilized
```
This freaked me out, because [based on this discussion](https://github.com/ggerganov/llama.cpp/discussions/4130#discussioncomment-8053636), the message implies that I'm actually only getting 4096 context due to parallelization. On the other hand, and I also see:
```
srv          init: initializing slots, n_slots = 1
slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192                                                         slot        reset: id  0 | task -1 |
```
which is what I would expect.
This discrepancy seems to be due to the fact that the [llama.cpp server temporarily increments n_parallel when loading the model](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/server.cpp#L647) (for a reason relating to Mamba? Not sure why we do this).
My concerns are:
- What context is actually being used here? 8192 or 4096?
- Should this be considered a bug, since the messages essentially contradict each other?

Please let me know if any other information is needed, but this should be easy to replicate. Thanks!

### Name and Version

```
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 4033 (a9e8a9a0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
```

### What operating system are you seeing the problem on?

Linux

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama.cpp server reports inaccurate n_ctx_per_seq? #10186

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: llama.cpp server reports inaccurate n_ctx_per_seq? #10186

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions