Broken generation with specific ngl values

While playing with implementing compression for copy/save state, I found a bug, which turned out to be reproducible in current `main` (41aee4d)

It seems to be model independent, and no parameters other than `-ngl` seem to make a difference either.

The first symptom happens for `save-load-state`, `main` and `server`, when `-ngl` equal to exactly N-1 is specified, basically this happens (generated output):

```
 Hello there!###############################
```

Second symptom was found by accident, when fiddling with `save-load-state` for the purpose of implementing compression. Basically, if `-ngl` is N or bigger (all layers loaded),
The problem above, seems to disappear, however:
Not only `save-load-state` fails because generated text is different for both runs,
but also, **after** some tokens were sampled `llama_copy_state_data` outputs mostly empty array, which I only noticed because I tried to dump the state post generation, and suddenly started to get 99% compression ratio on that array. Because it turned out to be mostly zeroes.

All `-ngl` values between 0 - (N-2) work properly.

I have no way of testing on AMD so I do not know if it's Nvidia specific.

[main.output.txt](https://github.com/ggerganov/llama.cpp/files/13193695/main.output.txt)
[main.log](https://github.com/ggerganov/llama.cpp/files/13193696/main.log)

As a sanity check, here are results for `-ngl` from 0 to N with the same model and parameters (except `-ngl`):

[out.txt](https://github.com/ggerganov/llama.cpp/files/13193775/out.txt)


Edit: Interestingly enough, perplexity looks fine ?
```
-ngl N-2 (27/29)
[1]5.2069,[2]5.1932,[3]5.1802,[4]5.2837,[5]5.2742,[6]5.0776,
Final estimate: PPL = 5.0776 +/- 0.25768
-ngl N-1 (28/29)
[1]5.2069,[2]5.1932,[3]5.1802,[4]5.2837,[5]5.2742,[6]5.0776,
Final estimate: PPL = 5.0776 +/- 0.25768
-ngl N (29/29)
[1]5.2077,[2]5.1813,[3]5.1687,[4]5.2820,[5]5.2682,[6]5.0756,
Final estimate: PPL = 5.0756 +/- 0.25766
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broken generation with specific ngl values #3820

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Broken generation with specific ngl values #3820

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions