Skip to content

[BUG] CLBLast generate garbage text on q8.0 models #1525

@CAHbKA-IV

Description

@CAHbKA-IV

CLBlast version (device AMD RX6800XT), for the Q8_0 models generate garbage result:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m VicUnlocked-30B-LoRA.ggml.q8_0.bin -s 1
main: build = 561 (5ea4339)
main: seed  = 1
llama.cpp: loading model from VicUnlocked-30B-LoRA.ggml.q8_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 135.75 KB
llama_model_load_internal: mem required  = 37206.11 MB (+ 3124.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: AMD Accelerated Parallel Processing Device: gfx1030
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 24 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 192, repeat_penalty = 1.400000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.500000, typical_p = 1.000000, temp = 0.740000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = 2048, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> My name is Alex. Your name is Lion. You are my personal AI assistant.
 Хронологија Awosiicherсти agesppe Mas Schmidtlichelackadalablo(@" Dynam Terminalairecompatchiaadre arrestilor CTommPRdaggerzilass Howard Sang PDF shadow SM >> Chal Byte Naval FAlaus changing hayoux ba bunchrokенrundUSEetch thrustREodortMR Spirit civ dig glob Tow agents
>

llama_print_timings:        load time =  9065.74 ms
llama_print_timings:      sample time =    22.36 ms /    60 runs   (    0.37 ms per token)
llama_print_timings: prompt eval time = 18715.84 ms /    39 tokens (  479.89 ms per token)
llama_print_timings:        eval time = 58273.65 ms /    60 runs   (  971.23 ms per token)
llama_print_timings:       total time = 113280.74 ms
Terminate batch job (Y/N)? y

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions