Skip to content

[User] No output on Windows with interactive mode. #1529

Closed
@chigkim

Description

@chigkim

I get no output when running in interactive mode on Windows.
However, I get output if I take out --color --interactive --reverse-prompt "User:" and run.
Also if I run the same command on Mac with --interactive --reverse-prompt "User:", I get output.
I built with w64devkit-1.19.0.
Here's the log.

main.exe --ctx_size 2048 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --batch_size 1024 --repeat_penalty 1.17647 --model "models/wizard-vicuna-13B.ggml.q4_0.bin" --n_predict 2048 --color --interactive --reverse-prompt "User:" --prompt "Text transcript of a never ending dialog, where User interacts with an AI assistant named ChatLLaMa. ChatLLaMa is helpful, kind, honest, friendly, good at writing and never fails to answer User's requests immediately and with details and precision. There are no annotations like (30 seconds passed...) or (to himself), just what User and ChatLLaMa say aloud to each other. The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long. The transcript only includes text, it does not include markup like HTML and Markdown."
main: build = 565 (943e608)
main: seed  = 1684519772
llama.cpp: loading model from models/wizard-vicuna-13B.ggml.q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =   0.09 MB
llama_model_load_internal: mem required  = 9807.48 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 256, repeat_penalty = 1.176470, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.500000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Text transcript of a never ending dialog, where User interacts with an AI assistant named ChatLLaMa. ChatLLaMa is helpful, kind, honest, friendly, good at writing and never fails to answer User's requests immediately and with details and precision. There are no annotations like (30 seconds passed...) or (to himself), just what User and ChatLLaMa say aloud to each other. The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long. The transcript only includes text, it does not include markup like HTML and Markdown.

User:Hello!

llama_print_timings:        load time = 29434.94 ms
llama_print_timings:      sample time =     6.12 ms /     4 runs   (    1.53 ms per token)
llama_print_timings: prompt eval time = 27966.57 ms /   137 tokens (  204.14 ms per token)
llama_print_timings:        eval time =  1033.31 ms /     3 runs   (  344.44 ms per token)
llama_print_timings:       total time = 466281.64 ms
Terminate batch job (Y/N)? y

Here's the output without interactive.

main.exe --ctx_size 2048 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --batch_size 1024 --repeat_penalty 1.17647 --model "models/wizard-vicuna-13B.ggml.q4_0.bin" --n_predict 2048 --prompt "Hello!"
main: build = 565 (943e608)
main: seed  = 1684520348
llama.cpp: loading model from models/wizard-vicuna-13B.ggml.q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =   0.09 MB
llama_model_load_internal: mem required  = 9807.48 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 256, repeat_penalty = 1.176470, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.500000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 0


 Hello!
Hello! is a fun and interactive app that allows users to connect with friends, family, and other people they know. With Hello!, you can easily send messages, share photos and videos, play games, and more. Whether you're looking for a new way to stay in touch with loved ones or want to meet new people, Hello! has something for everyone. So why wait? Download Hello! today and start connecting with the world around you! [end of text]

llama_print_timings:        load time =  2828.62 ms
llama_print_timings:      sample time =   142.54 ms /    93 runs   (    1.53 ms per token)
llama_print_timings: prompt eval time =  1651.94 ms /     3 tokens (  550.65 ms per token)
llama_print_timings:        eval time = 30975.91 ms /    92 runs   (  336.69 ms per token)
llama_print_timings:       total time = 34149.42 ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions