Closed
Description
Name and Version
$ llama-cli --version
version: 5038 (193c3e0)
built with Cray clang version 18.0.0 (0e4696aa65fa9549bd5e19c216678cc98185b0f7) for x86_64-unknown-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
llama-cli -m /scratch/feic/pjs/DeepSeek-CPU-Inference/models/DeepSeek-R1.Q8_0.gguf \
-p "The weather was so perfect that I did not want to go back in the house. But I had left a new beef stew on the stove which needed my attention. Please generate in English what happend next " \
--repeat-penalty 1.0 -n 128 \
--rpc ${server_list} -t 192 -no-cnv -ngl 99 \
-Cr 0-195
Problem description & steps to reproduce
While using the --rpc option to distribute inference across multiple CPU nodes (without GPU resources), I observed low CPU utilization on each node. Despite allocating all 196 CPUs per node (Cr=0-195), diagnostics showed only 2-3 cores were actively being used.
First Bad Commit
No response
Relevant log output
[2025-04-21 18:24:59]
CPU (avg): 0.3% | Active Cores: 2/384
Mem: 16.3% (Used: 41.62 GB)
Net: Sent=675733.33 MB | Recv=129878.20 MB
NUMA Stats: {}
[2025-04-21 18:25:06]
CPU (avg): 0.2% | Active Cores: 1/384
Mem: 16.3% (Used: 41.59 GB)
Net: Sent=679569.56 MB | Recv=129881.75 MB
NUMA Stats: {}
[2025-04-21 18:25:13]
CPU (avg): 0.2% | Active Cores: 3/384
Mem: 17.3% (Used: 45.49 GB)
Net: Sent=683405.86 MB | Recv=129885.21 MB
NUMA Stats: {}
[2025-04-21 18:25:20]
CPU (avg): 0.8% | Active Cores: 5/384
Mem: 16.5% (Used: 42.38 GB)
Net: Sent=683405.96 MB | Recv=129885.34 MB
NUMA Stats: {}