Skip to content

Misc. bug: model warmup doesn't work correctly for MoE models #11163

Closed
@cpumaxx

Description

@cpumaxx

Name and Version

build: 4449 (8a1d9c2) with cc (Debian 13.3.0-11) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./build/bin/llama-cli -m ds3-q8.gguf -t 128 --numa distribute -c 8192 -ngl 0 --interactive-first --chat-template deepseek3

Problem description & steps to reproduce

If I load a dense model, it will warmup the model correctly, loading the whole thing into OS cache.

However, if I load a big MoE in (eg. deepseek 3), it will only load a small portion (93GB/660GB)

I tested this and made an inefficient bruteforce patch to common.cpp:

>             if (decoder_start_token_id == -1) {
995,1003c992,993
<             printf("decoding warmup tokens.");
<             for (int i = 1; i <256 ; i++) {
<                 llama_decode(lctx, llama_batch_get_one(tmp.data(), std::min(tmp.size(), (size_t) params.n_batch)));
<                 tmp.clear();
<                 tmp.push_back(i);
<                 printf(".");
<             }
<         } else { LOG_WRN("No Decoder Present. Warmup impossible"); }
<         printf("\n");

The benefit falls off sharply with the number of llama_decode() calls. e.g. With 256 calls it gets 540GB of the model loaded. 1024 gets 620.

I think that ideally this function would detect the number of experts and call a function that would choose a single token through each expert via the router (this may need a function other than llama_decode that is expert router aware?)

I could probably make a good PR for this with some guidance.

First Bad Commit

This has never worked afaik

Relevant log output

No logging for this problem. Need to watch OS cache usage with a tool.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions