llama

package module
v0.0.0-...-6a8041e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2024 License: MIT Imports: 6 Imported by: 14

README

Go Reference go-llama.cpp

LLama.cpp golang bindings.

The go-llama.cpp bindings are high level, as such most of the work is kept into the C/C++ code to avoid any extra computational cost, be more performant and lastly ease out maintenance, while keeping the usage as simple as possible.

Check out this and this write-ups which summarize the impact of a low-level interface which calls C functions from Go.

If you are looking for an high-level OpenAI compatible API, check out here.

Attention!

Since https://github.com/go-skynet/go-llama.cpp/pull/180 is merged, now go-llama.cpp is not anymore compatible with ggml format, but it works ONLY with the new gguf file format. See also the upstream PR: https://github.com/ggerganov/llama.cpp/pull/2398.

If you need to use the ggml format, use the https://github.com/go-skynet/go-llama.cpp/releases/tag/pre-gguf tag.

Usage

Note: This repository uses git submodules to keep track of LLama.cpp.

Clone the repository locally:

git clone --recurse-submodules https://github.com/go-skynet/go-llama.cpp

To build the bindings locally, run:

cd go-llama.cpp
make libbinding.a

Now you can run the example with:

LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

Acceleration

OpenBLAS

To build and run with OpenBLAS, for example:

BUILD_TYPE=openblas make libbinding.a
CGO_LDFLAGS="-lopenblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run -tags openblas ./examples -m "/model/path/here" -t 14
CuBLAS

To build with CuBLAS:

BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14
ROCM

To build with ROCM (HIPBLAS):

BUILD_TYPE=hipblas make libbinding.a
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CGO_LDFLAGS="-O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc -lrocblas -lhipblas" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -ngl 64 -t 32
OpenCL
BUILD_TYPE=clblas CLBLAS_DIR=... make libbinding.a
CGO_LDFLAGS="-lOpenCL -lclblast -L/usr/local/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

You should see something like this from the output when using the GPU:

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Graphics [0x46a6]'
ggml_opencl: device FP16 support: true

GPU offloading

Metal (Apple Silicon)
BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go build ./examples/main.go
cp build/bin/ggml-metal.metal .
./main -m "/model/path/here" -t 1 -ngl 1

Enjoy!

The documentation is available here and the full example code is here.

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type LLama

type LLama struct {
	// contains filtered or unexported fields
}

func New

func New(model string, opts ...ModelOption) (*LLama, error)

func (*LLama) Embeddings

func (l *LLama) Embeddings(text string, opts ...PredictOption) ([]float32, error)

Embeddings

func (*LLama) Eval

func (l *LLama) Eval(text string, opts ...PredictOption) error

func (*LLama) Free

func (l *LLama) Free()

func (*LLama) LoadState

func (l *LLama) LoadState(state string) error

func (*LLama) Predict

func (l *LLama) Predict(text string, opts ...PredictOption) (string, error)

func (*LLama) SaveState

func (l *LLama) SaveState(dst string) error

func (*LLama) SetTokenCallback

func (l *LLama) SetTokenCallback(callback func(token string) bool)

SetTokenCallback registers a callback for the individual tokens created when running Predict. It will be called once for each token. The callback shall return true as long as the model should continue predicting the next token. When the callback returns false the predictor will return. The tokens are just converted into Go strings, they are not trimmed or otherwise changed. Also the tokens may not be valid UTF-8. Pass in nil to remove a callback.

It is save to call this method while a prediction is running.

func (*LLama) SpeculativeSampling

func (l *LLama) SpeculativeSampling(ll *LLama, text string, opts ...PredictOption) (string, error)

func (*LLama) TokenEmbeddings

func (l *LLama) TokenEmbeddings(tokens []int, opts ...PredictOption) ([]float32, error)

Token Embeddings

func (*LLama) TokenizeString

func (l *LLama) TokenizeString(text string, opts ...PredictOption) (int32, []int32, error)

tokenize has an interesting return property: negative lengths (potentially) have meaning. Therefore, return the length seperate from the slice and error - all three can be used together

type ModelOption

type ModelOption func(p *ModelOptions)
var EnabelLowVRAM ModelOption = func(p *ModelOptions) {
	p.LowVRAM = true
}
var EnableEmbeddings ModelOption = func(p *ModelOptions) {
	p.Embeddings = true
}
var EnableF16Memory ModelOption = func(p *ModelOptions) {
	p.F16Memory = true
}
var EnableMLock ModelOption = func(p *ModelOptions) {
	p.MLock = true
}
var EnableNUMA ModelOption = func(p *ModelOptions) {
	p.NUMA = true
}

func SetContext

func SetContext(c int) ModelOption

SetContext sets the context size.

func SetGPULayers

func SetGPULayers(n int) ModelOption

SetGPULayers sets the number of GPU layers to use to offload computation

func SetLoraAdapter

func SetLoraAdapter(s string) ModelOption

func SetLoraBase

func SetLoraBase(s string) ModelOption

func SetMMap

func SetMMap(b bool) ModelOption

SetContext sets the context size.

func SetMainGPU

func SetMainGPU(maingpu string) ModelOption

SetMainGPU sets the main_gpu

func SetModelSeed

func SetModelSeed(c int) ModelOption

func SetMulMatQ

func SetMulMatQ(b bool) ModelOption

func SetNBatch

func SetNBatch(n_batch int) ModelOption

SetNBatch sets the n_Batch

func SetPerplexity

func SetPerplexity(b bool) ModelOption

func SetTensorSplit

func SetTensorSplit(maingpu string) ModelOption

Set sets the tensor split for the GPU

func WithRopeFreqBase

func WithRopeFreqBase(f float32) ModelOption

func WithRopeFreqScale

func WithRopeFreqScale(f float32) ModelOption

type ModelOptions

type ModelOptions struct {
	ContextSize   int
	Seed          int
	NBatch        int
	F16Memory     bool
	MLock         bool
	MMap          bool
	LowVRAM       bool
	Embeddings    bool
	NUMA          bool
	NGPULayers    int
	MainGPU       string
	TensorSplit   string
	FreqRopeBase  float32
	FreqRopeScale float32
	MulMatQ       *bool
	LoraBase      string
	LoraAdapter   string
	Perplexity    bool
}
var DefaultModelOptions ModelOptions = ModelOptions{
	ContextSize:   512,
	Seed:          0,
	F16Memory:     false,
	MLock:         false,
	Embeddings:    false,
	MMap:          true,
	LowVRAM:       false,
	NBatch:        512,
	FreqRopeBase:  10000,
	FreqRopeScale: 1.0,
}

func NewModelOptions

func NewModelOptions(opts ...ModelOption) ModelOptions

Create a new PredictOptions object with the given options.

type PredictOption

type PredictOption func(p *PredictOptions)
var Debug PredictOption = func(p *PredictOptions) {
	p.DebugMode = true
}
var EnableF16KV PredictOption = func(p *PredictOptions) {
	p.F16KV = true
}
var EnablePromptCacheAll PredictOption = func(p *PredictOptions) {
	p.PromptCacheAll = true
}
var EnablePromptCacheRO PredictOption = func(p *PredictOptions) {
	p.PromptCacheRO = true
}
var IgnoreEOS PredictOption = func(p *PredictOptions) {
	p.IgnoreEOS = true
}

func SetBatch

func SetBatch(size int) PredictOption

SetBatch sets the batch size.

func SetFrequencyPenalty

func SetFrequencyPenalty(fp float32) PredictOption

SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.

func SetLogitBias

func SetLogitBias(lb string) PredictOption

SetLogitBias sets the logit bias parameter.

func SetMemoryMap

func SetMemoryMap(b bool) PredictOption

SetMemoryMap sets memory mapping.

func SetMirostat

func SetMirostat(m int) PredictOption

SetMirostat sets the mirostat parameter.

func SetMirostatETA

func SetMirostatETA(me float32) PredictOption

SetMirostatETA sets the mirostat ETA parameter.

func SetMirostatTAU

func SetMirostatTAU(mt float32) PredictOption

SetMirostatTAU sets the mirostat TAU parameter.

func SetMlock

func SetMlock(b bool) PredictOption

SetMlock sets the memory lock.

func SetNDraft

func SetNDraft(nd int) PredictOption

func SetNKeep

func SetNKeep(n int) PredictOption

SetKeep sets the number of tokens from initial prompt to keep.

func SetNegativePrompt

func SetNegativePrompt(np string) PredictOption

func SetNegativePromptScale

func SetNegativePromptScale(nps float32) PredictOption

func SetPathPromptCache

func SetPathPromptCache(f string) PredictOption

SetPathPromptCache sets the session file to store the prompt cache.

func SetPenalizeNL

func SetPenalizeNL(pnl bool) PredictOption

SetPenalizeNL sets whether to penalize newlines or not.

func SetPenalty

func SetPenalty(penalty float32) PredictOption

SetPenalty sets the repetition penalty for text generation.

func SetPredictionMainGPU

func SetPredictionMainGPU(maingpu string) PredictOption

SetPredictionMainGPU sets the main_gpu

func SetPredictionTensorSplit

func SetPredictionTensorSplit(maingpu string) PredictOption

SetPredictionTensorSplit sets the tensor split for the GPU

func SetPresencePenalty

func SetPresencePenalty(pp float32) PredictOption

SetPresencePenalty sets the presence penalty parameter, presence_penalty.

func SetRepeat

func SetRepeat(repeat int) PredictOption

SetRepeat sets the number of times to repeat text generation.

func SetRopeFreqBase

func SetRopeFreqBase(rfb float32) PredictOption

Rope and negative prompt parameters

func SetRopeFreqScale

func SetRopeFreqScale(rfs float32) PredictOption

func SetSeed

func SetSeed(seed int) PredictOption

SetSeed sets the random seed for sampling text generation.

func SetStopWords

func SetStopWords(stop ...string) PredictOption

SetStopWords sets the prompts that will stop predictions.

func SetTailFreeSamplingZ

func SetTailFreeSamplingZ(tfz float32) PredictOption

SetTailFreeSamplingZ sets the tail free sampling, parameter z.

func SetTemperature

func SetTemperature(temp float32) PredictOption

SetTemperature sets the temperature value for text generation.

func SetThreads

func SetThreads(threads int) PredictOption

SetThreads sets the number of threads to use for text generation.

func SetTokenCallback

func SetTokenCallback(fn func(string) bool) PredictOption

SetTokenCallback sets the prompts that will stop predictions.

func SetTokens

func SetTokens(tokens int) PredictOption

SetTokens sets the number of tokens to generate.

func SetTopK

func SetTopK(topk int) PredictOption

SetTopK sets the value for top-K sampling.

func SetTopP

func SetTopP(topp float32) PredictOption

SetTopP sets the value for nucleus sampling.

func SetTypicalP

func SetTypicalP(tp float32) PredictOption

SetTypicalP sets the typicality parameter, p_typical.

func WithGrammar

func WithGrammar(s string) PredictOption

WithGrammar sets the grammar to constrain the output of the LLM response

type PredictOptions

type PredictOptions struct {
	Seed, Threads, Tokens, TopK, Repeat, Batch, NKeep int
	TopP, Temperature, Penalty                        float32
	NDraft                                            int
	F16KV                                             bool
	DebugMode                                         bool
	StopPrompts                                       []string
	IgnoreEOS                                         bool

	TailFreeSamplingZ float32
	TypicalP          float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	Mirostat          int
	MirostatETA       float32
	MirostatTAU       float32
	PenalizeNL        bool
	LogitBias         string
	TokenCallback     func(string) bool

	PathPromptCache             string
	MLock, MMap, PromptCacheAll bool
	PromptCacheRO               bool
	Grammar                     string
	MainGPU                     string
	TensorSplit                 string

	// Rope parameters
	RopeFreqBase  float32
	RopeFreqScale float32

	// Negative prompt parameters
	NegativePromptScale float32
	NegativePrompt      string
}
var DefaultOptions PredictOptions = PredictOptions{
	Seed:              -1,
	Threads:           4,
	Tokens:            128,
	Penalty:           1.1,
	Repeat:            64,
	Batch:             512,
	NKeep:             64,
	TopK:              40,
	TopP:              0.95,
	TailFreeSamplingZ: 1.0,
	TypicalP:          1.0,
	Temperature:       0.8,
	FrequencyPenalty:  0.0,
	PresencePenalty:   0.0,
	Mirostat:          0,
	MirostatTAU:       5.0,
	MirostatETA:       0.1,
	MMap:              true,
	RopeFreqBase:      10000,
	RopeFreqScale:     1.0,
}

func NewPredictOptions

func NewPredictOptions(opts ...PredictOption) PredictOptions

Create a new PredictOptions object with the given options.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL