Skip to content

server : support reading arguments from environment variables #9105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 21, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 20, 2024

Motivation

When deploying to HF inference endpoint, we only have control over the environment variables that can be passed to docker. That's why currently we need to build a custom container and specify these variables via LLAMACPP_ARGS (ref: #9041)

This PR add some server-related arguments to environment variables (see a full list in server/README.md)

Variables are being prefixed LLAMA_ARG_ to distinguish them from compile-time variables like LLAMA_CURL.

Example

LLAMA_ARG_MODEL=../models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf LLAMA_ARG_CTX_SIZE=1024 LLAMA_ARG_N_PARALLEL=2 LLAMA_ARG_ENDPOINT_METRICS=1 ./llama-server

In case the same variable is specified in both env and arg, we prioritize env variable:

LLAMA_ARG_MODEL=my_model.gguf ./llama-server -m another_model.gguf
# Expected behavior: we load my_model.gguf
# (in other words, "-m another_model.gguf" is ignored)

On HF infrefence endpoint, these variables can be set from "Settings" tab. (In near future, these variable will be exposed as pre-defined input fields in the UI)

image

@ngxson ngxson merged commit fc54ef0 into master Aug 21, 2024
53 checks passed
@ngxson ngxson mentioned this pull request Sep 5, 2024
7 tasks
@ngxson ngxson deleted the xsn/server_env_var branch September 10, 2024 20:47
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…rg#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…rg#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025
…rg#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025
…rg#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants