Unable to get a response in interactive mode

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x ] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x ] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

I am expecting AI responses to my responses, allowing for a 2-way conversation.

# Current Behavior

Once it's my turn to provide a prompt and I press enter, the CPU will reach around 30% and then never generate a response at any point, no longer how long it's left to run. I'm always forced to sigint using Ctrl+C in order to terminate llama.cpp

I've also tried it with 7B, but the result is sadly still the same.

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz with 32GB of RAM at 2400MHz`

* Operating System, e.g. for Linux:

`Windows 10 v1909`

* SDK version, e.g. for Linux:

```
Python 3.10.4
GNU Make 4.4
G++ (GCC) 13.1.0
```

# Steps to Reproduce

run parameters as per usual, attempt to respond, and simply wait. --keep is not necessary, that was merely the result of my last test run to see if it changed anything.

# Failure Logs
```
E:\LLaMA\llama.cpp>main -m models/30B/ggml-model-q4_0.bin -n -1 -c 2048 -i -r "User:" --color --keep -1 --prompt "Hello!How are you? Please answer in less than 5 words."
main: build = 0 (unknown)
main: seed  = 1683935758
llama.cpp: loading model from models/30B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required  = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 16


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Hello!How are you? Please answer in less than 5 words.
I'm ok,how are you? Answer please in less than five words.
Good question. Here is my answer: 'How am I ? '
i'm doing great, what about u?
Not good at all! How are you?
Hi, whats up? how are you?
I am fine thanks! And you?
User:I'm doing great, thank you!

llama_print_timings:        load time = 19669.43 ms
llama_print_timings:      sample time =    57.24 ms /    75 runs   (    0.76 ms per run)
llama_print_timings: prompt eval time = 17329.88 ms /    16 tokens ( 1083.12 ms per token)
llama_print_timings:        eval time = 86146.71 ms /    75 runs   ( 1148.62 ms per run)
llama_print_timings:       total time = 850995.00 ms
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to get a response in interactive mode #1423

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to get a response in interactive mode #1423

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions