llama

package module

v0.0.0-...-6a8041e Latest Latest Go to latest Published: Mar 14, 2024 License: MIT Imports: 6 Imported by: 14

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/go-skynet/go-llama.cpp

Links

Open Source Insights

README ¶

go-llama.cpp

LLama.cpp golang bindings.

The go-llama.cpp bindings are high level, as such most of the work is kept into the C/C++ code to avoid any extra computational cost, be more performant and lastly ease out maintenance, while keeping the usage as simple as possible.

Check out this and this write-ups which summarize the impact of a low-level interface which calls C functions from Go.

If you are looking for an high-level OpenAI compatible API, check out here.

Attention!

Since https://github.com/go-skynet/go-llama.cpp/pull/180 is merged, now go-llama.cpp is not anymore compatible with ggml format, but it works ONLY with the new gguf file format. See also the upstream PR: https://github.com/ggerganov/llama.cpp/pull/2398.

If you need to use the ggml format, use the https://github.com/go-skynet/go-llama.cpp/releases/tag/pre-gguf tag.

Usage

Note: This repository uses git submodules to keep track of LLama.cpp.

Clone the repository locally:

git clone --recurse-submodules https://github.com/go-skynet/go-llama.cpp

To build the bindings locally, run:

cd go-llama.cpp
make libbinding.a

Now you can run the example with:

LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

Acceleration

OpenBLAS

To build and run with OpenBLAS, for example:

BUILD_TYPE=openblas make libbinding.a
CGO_LDFLAGS="-lopenblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run -tags openblas ./examples -m "/model/path/here" -t 14

CuBLAS

To build with CuBLAS:

BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

ROCM

To build with ROCM (HIPBLAS):

BUILD_TYPE=hipblas make libbinding.a
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CGO_LDFLAGS="-O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc -lrocblas -lhipblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -ngl 64 -t 32

OpenCL

BUILD_TYPE=clblas CLBLAS_DIR=... make libbinding.a
CGO_LDFLAGS="-lOpenCL -lclblast -L/usr/local/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

You should see something like this from the output when using the GPU:

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics [0x46a6]'
ggml_opencl: device FP16 support: true

GPU offloading

Metal (Apple Silicon)

BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go build ./examples/main.go
cp build/bin/ggml-metal.metal .
./main -m "/model/path/here" -t 1 -ngl 1

Enjoy!

The documentation is available here and the full example code is here.

License

MIT

Documentation ¶

Index ¶

type LLama
- func New(model string, opts ...ModelOption) (*LLama, error)
type ModelOption
type ModelOptions
- func NewModelOptions(opts ...ModelOption) ModelOptions
type PredictOption
type PredictOptions
- func NewPredictOptions(opts ...PredictOption) PredictOptions

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type LLama ¶

type LLama struct {
	// contains filtered or unexported fields
}

func New ¶

func New(model string, opts ...ModelOption) (*LLama, error)

func (*LLama) Embeddings ¶

func (l *LLama) Embeddings(text string, opts ...PredictOption) ([]float32, error)

Embeddings

func (*LLama) Eval ¶

func (l *LLama) Eval(text string, opts ...PredictOption) error

func (*LLama) Free ¶

func (l *LLama) Free()

func (*LLama) LoadState ¶

func (l *LLama) LoadState(state string) error

func (*LLama) Predict ¶

func (l *LLama) Predict(text string, opts ...PredictOption) (string, error)

func (*LLama) SaveState ¶

func (l *LLama) SaveState(dst string) error

func (*LLama) SetTokenCallback ¶

func (l *LLama) SetTokenCallback(callback func(token string) bool)

SetTokenCallback registers a callback for the individual tokens created when running Predict. It will be called once for each token. The callback shall return true as long as the model should continue predicting the next token. When the callback returns false the predictor will return. The tokens are just converted into Go strings, they are not trimmed or otherwise changed. Also the tokens may not be valid UTF-8. Pass in nil to remove a callback.

It is save to call this method while a prediction is running.

func (*LLama) SpeculativeSampling ¶

func (l *LLama) SpeculativeSampling(ll *LLama, text string, opts ...PredictOption) (string, error)

func (*LLama) TokenEmbeddings ¶

func (l *LLama) TokenEmbeddings(tokens []int, opts ...PredictOption) ([]float32, error)

Token Embeddings

func (*LLama) TokenizeString ¶

func (l *LLama) TokenizeString(text string, opts ...PredictOption) (int32, []int32, error)

tokenize has an interesting return property: negative lengths (potentially) have meaning. Therefore, return the length seperate from the slice and error - all three can be used together

type ModelOption ¶

type ModelOption func(p *ModelOptions)

var EnabelLowVRAM ModelOption = func(p *ModelOptions) {
	p.LowVRAM = true
}

var EnableEmbeddings ModelOption = func(p *ModelOptions) {
	p.Embeddings = true
}

var EnableF16Memory ModelOption = func(p *ModelOptions) {
	p.F16Memory = true
}

var EnableMLock ModelOption = func(p *ModelOptions) {
	p.MLock = true
}

var EnableNUMA ModelOption = func(p *ModelOptions) {
	p.NUMA = true
}

func SetContext ¶

func SetContext(c int) ModelOption

SetContext sets the context size.

func SetGPULayers ¶

func SetGPULayers(n int) ModelOption

SetGPULayers sets the number of GPU layers to use to offload computation

func SetLoraAdapter ¶

func SetLoraAdapter(s string) ModelOption

func SetLoraBase ¶

func SetLoraBase(s string) ModelOption

func SetMMap ¶

func SetMMap(b bool) ModelOption

SetContext sets the context size.

func SetMainGPU ¶

func SetMainGPU(maingpu string) ModelOption

SetMainGPU sets the main_gpu

func SetModelSeed ¶

func SetModelSeed(c int) ModelOption

func SetMulMatQ ¶

func SetMulMatQ(b bool) ModelOption

func SetNBatch ¶

func SetNBatch(n_batch int) ModelOption

SetNBatch sets the n_Batch

func SetPerplexity ¶

func SetPerplexity(b bool) ModelOption

func SetTensorSplit ¶

func SetTensorSplit(maingpu string) ModelOption

Set sets the tensor split for the GPU

func WithRopeFreqBase ¶

func WithRopeFreqBase(f float32) ModelOption

func WithRopeFreqScale ¶

func WithRopeFreqScale(f float32) ModelOption

type ModelOptions ¶

type ModelOptions struct {
	ContextSize   int
	Seed          int
	NBatch        int
	F16Memory     bool
	MLock         bool
	MMap          bool
	LowVRAM       bool
	Embeddings    bool
	NUMA          bool
	NGPULayers    int
	MainGPU       string
	TensorSplit   string
	FreqRopeBase  float32
	FreqRopeScale float32
	MulMatQ       *bool
	LoraBase      string
	LoraAdapter   string
	Perplexity    bool
}

var DefaultModelOptions ModelOptions = ModelOptions{
	ContextSize:   512,
	Seed:          0,
	F16Memory:     false,
	MLock:         false,
	Embeddings:    false,
	MMap:          true,
	LowVRAM:       false,
	NBatch:        512,
	FreqRopeBase:  10000,
	FreqRopeScale: 1.0,
}

func NewModelOptions ¶

func NewModelOptions(opts ...ModelOption) ModelOptions

Create a new PredictOptions object with the given options.

type PredictOption ¶

type PredictOption func(p *PredictOptions)

var Debug PredictOption = func(p *PredictOptions) {
	p.DebugMode = true
}

var EnableF16KV PredictOption = func(p *PredictOptions) {
	p.F16KV = true
}

var EnablePromptCacheAll PredictOption = func(p *PredictOptions) {
	p.PromptCacheAll = true
}

var EnablePromptCacheRO PredictOption = func(p *PredictOptions) {
	p.PromptCacheRO = true
}

var IgnoreEOS PredictOption = func(p *PredictOptions) {
	p.IgnoreEOS = true
}

func SetBatch ¶

func SetBatch(size int) PredictOption

SetBatch sets the batch size.

func SetFrequencyPenalty ¶

func SetFrequencyPenalty(fp float32) PredictOption

SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.

func SetLogitBias ¶

func SetLogitBias(lb string) PredictOption

SetLogitBias sets the logit bias parameter.

func SetMemoryMap ¶

func SetMemoryMap(b bool) PredictOption

SetMemoryMap sets memory mapping.

func SetMirostat ¶

func SetMirostat(m int) PredictOption

SetMirostat sets the mirostat parameter.

func SetMirostatETA ¶

func SetMirostatETA(me float32) PredictOption

SetMirostatETA sets the mirostat ETA parameter.

func SetMirostatTAU ¶

func SetMirostatTAU(mt float32) PredictOption

SetMirostatTAU sets the mirostat TAU parameter.

func SetMlock ¶

func SetMlock(b bool) PredictOption

SetMlock sets the memory lock.

func SetNDraft ¶

func SetNDraft(nd int) PredictOption

func SetNKeep ¶

func SetNKeep(n int) PredictOption

SetKeep sets the number of tokens from initial prompt to keep.

func SetNegativePrompt ¶

func SetNegativePrompt(np string) PredictOption

func SetNegativePromptScale ¶

func SetNegativePromptScale(nps float32) PredictOption

func SetPathPromptCache ¶

func SetPathPromptCache(f string) PredictOption

SetPathPromptCache sets the session file to store the prompt cache.

func SetPenalizeNL ¶

func SetPenalizeNL(pnl bool) PredictOption

SetPenalizeNL sets whether to penalize newlines or not.

func SetPenalty ¶

func SetPenalty(penalty float32) PredictOption

SetPenalty sets the repetition penalty for text generation.

func SetPredictionMainGPU ¶

func SetPredictionMainGPU(maingpu string) PredictOption

SetPredictionMainGPU sets the main_gpu

func SetPredictionTensorSplit ¶

func SetPredictionTensorSplit(maingpu string) PredictOption

SetPredictionTensorSplit sets the tensor split for the GPU

func SetPresencePenalty ¶

func SetPresencePenalty(pp float32) PredictOption

SetPresencePenalty sets the presence penalty parameter, presence_penalty.

func SetRepeat ¶

func SetRepeat(repeat int) PredictOption

SetRepeat sets the number of times to repeat text generation.

func SetRopeFreqBase ¶

func SetRopeFreqBase(rfb float32) PredictOption

Rope and negative prompt parameters

func SetRopeFreqScale ¶

func SetRopeFreqScale(rfs float32) PredictOption

func SetSeed ¶

func SetSeed(seed int) PredictOption

SetSeed sets the random seed for sampling text generation.

func SetStopWords ¶

func SetStopWords(stop ...string) PredictOption

SetStopWords sets the prompts that will stop predictions.

func SetTailFreeSamplingZ ¶

func SetTailFreeSamplingZ(tfz float32) PredictOption

SetTailFreeSamplingZ sets the tail free sampling, parameter z.

func SetTemperature ¶

func SetTemperature(temp float32) PredictOption

SetTemperature sets the temperature value for text generation.

func SetThreads ¶

func SetThreads(threads int) PredictOption

SetThreads sets the number of threads to use for text generation.

func SetTokenCallback ¶

func SetTokenCallback(fn func(string) bool) PredictOption

SetTokenCallback sets the prompts that will stop predictions.

func SetTokens ¶

func SetTokens(tokens int) PredictOption

SetTokens sets the number of tokens to generate.

func SetTopK ¶

func SetTopK(topk int) PredictOption

SetTopK sets the value for top-K sampling.

func SetTopP ¶

func SetTopP(topp float32) PredictOption

SetTopP sets the value for nucleus sampling.

func SetTypicalP ¶

func SetTypicalP(tp float32) PredictOption

SetTypicalP sets the typicality parameter, p_typical.

func WithGrammar ¶

func WithGrammar(s string) PredictOption

WithGrammar sets the grammar to constrain the output of the LLM response

type PredictOptions ¶

type PredictOptions struct {
	Seed, Threads, Tokens, TopK, Repeat, Batch, NKeep int
	TopP, Temperature, Penalty                        float32
	NDraft                                            int
	F16KV                                             bool
	DebugMode                                         bool
	StopPrompts                                       []string
	IgnoreEOS                                         bool

	TailFreeSamplingZ float32
	TypicalP          float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	Mirostat          int
	MirostatETA       float32
	MirostatTAU       float32
	PenalizeNL        bool
	LogitBias         string
	TokenCallback     func(string) bool

	PathPromptCache             string
	MLock, MMap, PromptCacheAll bool
	PromptCacheRO               bool
	Grammar                     string
	MainGPU                     string
	TensorSplit                 string

	// Rope parameters
	RopeFreqBase  float32
	RopeFreqScale float32

	// Negative prompt parameters
	NegativePromptScale float32
	NegativePrompt      string
}

var DefaultOptions PredictOptions = PredictOptions{
	Seed:              -1,
	Threads:           4,
	Tokens:            128,
	Penalty:           1.1,
	Repeat:            64,
	Batch:             512,
	NKeep:             64,
	TopK:              40,
	TopP:              0.95,
	TailFreeSamplingZ: 1.0,
	TypicalP:          1.0,
	Temperature:       0.8,
	FrequencyPenalty:  0.0,
	PresencePenalty:   0.0,
	Mirostat:          0,
	MirostatTAU:       5.0,
	MirostatETA:       0.1,
	MMap:              true,
	RopeFreqBase:      10000,
	RopeFreqScale:     1.0,
}

func NewPredictOptions ¶

func NewPredictOptions(opts ...PredictOption) PredictOptions

Create a new PredictOptions object with the given options.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL