Skip to content

LLaVA does not offload layers to GPU #3616

Closed
@ruslanmustafin

Description

@ruslanmustafin

The issue was already mentioned in #3436. Creating a separate issue so that it does not get lost.

I run LLaVA with (commit id: 1e0e873)

./llava -m ggml-model-q5_k.gguf \
        --mmproj mmproj-model-f16.gguf \
        --temp 0.1 -ngl 64 -mg 0 \
        --image n008-2018-09-18-14-54-39-0400__CAM_FRONT__1537297366762404.jpg

This the relevant parts from the output:

ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6

...

llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
llm_load_tensors: mem required  = 4560.96 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 162.13 MB
llama_new_context_with_model: VRAM scratch buffer: 156.00 MB
llama_new_context_with_model: total VRAM used: 156.00 MB (model: 0.00 MB, context: 156.00 MB)

...

main: image encoded in  1561.49 ms by CLIP (    2.71 ms per image patch)

llama_print_timings:        load time =    3042.21 ms
llama_print_timings:      sample time =      11.65 ms /   136 runs   (    0.09 ms per token, 11671.82 tokens per second)
llama_print_timings: prompt eval time =    9440.69 ms /   626 tokens (   15.08 ms per token,    66.31 tokens per second)
llama_print_timings:        eval time =   47661.78 ms /   136 runs   (  350.45 ms per token,     2.85 tokens per second)
llama_print_timings:       total time =   58800.36 ms
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions