CLBlast fails on context lengths above 2048 after merging #4256

Inference with CLBlast fails with a segfault after the commit that merged https://github.com/ggerganov/llama.cpp/pull/4256 on context sizes above 2k when all GPU layers are offloaded.

Command line: 
`C:\test\llama-b1601-bin-win-clblast-x64>main.exe -m E:\LLaMA\models\airoboros-mistral2.2-7b.Q4_K_S.gguf -c 4096 -b 512 -n 32 -ngl 33 -f C:\test\test.txt`

```
main: build = 1601 (5a7d312)
main: built with MSVC 19.37.32826.1 for x64
main: seed  = 1701534899
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 2060'
ggml_opencl: device FP16 support: false
```

Result:
Prompt processing starts, and then segfaults halfway around the 2k token mark, before generation begins. Only if the prompt is short enough (less than 2k tokens) it appears to work. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLBlast fails on context lengths above 2048 after merging #4256 #4296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CLBlast fails on context lengths above 2048 after merging #4256 #4296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions