server : support reading arguments from environment variables #9105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When deploying to HF inference endpoint, we only have control over the environment variables that can be passed to docker. That's why currently we need to build a custom container and specify these variables via
LLAMACPP_ARGS
(ref: #9041)This PR add some server-related arguments to environment variables (see a full list in
server/README.md
)Variables are being prefixed
LLAMA_ARG_
to distinguish them from compile-time variables likeLLAMA_CURL
.Example
In case the same variable is specified in both env and arg, we prioritize env variable:
On HF infrefence endpoint, these variables can be set from "Settings" tab. (In near future, these variable will be exposed as pre-defined input fields in the UI)