Type Alias: LlamaOptions

type LlamaOptions: {
  gpu: "auto" | LlamaGpuType | {
     type: "auto";
     exclude: LlamaGpuType[];
    };
  logLevel: LlamaLogLevel;
  logger: (level: LlamaLogLevel, message: string) => void;
  build: "auto" | "never" | "forceRebuild";
  cmakeOptions: Record<string, string>;
  existingPrebuiltBinaryMustMatchBuildOptions: boolean;
  usePrebuiltBinaries: boolean;
  progressLogs: boolean;
  skipDownload: boolean;
  maxThreads: number;
  vramPadding: number | (totalVram: number) => number;
  ramPadding: number | (totalRam: number) => number;
  debug: boolean;
};

Type declaration

gpu?

optional gpu: "auto" | LlamaGpuType | {
  type: "auto";
  exclude: LlamaGpuType[];
};

The compute layer implementation type to use for llama.cpp.

"auto": Automatically detect and use the best GPU available (Metal on macOS, and CUDA or Vulkan on Windows and Linux)
"metal": Use Metal. Only supported on macOS. Enabled by default on Apple Silicon Macs.
"cuda": Use CUDA.
"vulkan": Use Vulkan.
false: Disable any GPU support and only use the CPU.

"auto" by default.

logLevel?

optional logLevel: LlamaLogLevel;

Set the minimum log level for llama.cpp. Defaults to "warn".

logger()?

optional logger: (level: LlamaLogLevel, message: string) => void;

Set a custom logger for llama.cpp logs.

Parameters

Parameter	Type
`level`	`LlamaLogLevel`
`message`	`string`

Returns

void

build?

optional build: "auto" | "never" | "forceRebuild";

Set what build method to use.

"auto": If a local build is found, use it. Otherwise, if a prebuilt binary is found, use it. Otherwise, build from source.
"never": If a local build is found, use it. Otherwise, if a prebuilt binary is found, use it. Otherwise, throw a NoBinaryFoundError error.
"forceRebuild": Always build from source. Be cautious with this option, as it will cause the build to fail on Windows when the binaries are in use by another process.

When running from inside an Asar archive in Electron, building from source is not possible, so it'll never build from source. To allow building from source in Electron apps, make sure you ship node-llama-cpp as an unpacked module.

Defaults to "auto". On Electron, defaults to "never".

cmakeOptions?

optional cmakeOptions: Record<string, string>;

Set custom CMake options for llama.cpp

existingPrebuiltBinaryMustMatchBuildOptions?

optional existingPrebuiltBinaryMustMatchBuildOptions: boolean;

When a prebuilt binary is found, only use it if it was built with the same build options as the ones specified in buildOptions. Disabled by default.

usePrebuiltBinaries?

optional usePrebuiltBinaries: boolean;

Use prebuilt binaries if they match the build options. Enabled by default.

progressLogs?

optional progressLogs: boolean;

Print binary compilation progress logs. Enabled by default.

skipDownload?

optional skipDownload: boolean;

Don't download llama.cpp source if it's not found. When set to true, and llama.cpp source is not found, a NoBinaryFoundError error will be thrown. Disabled by default.

maxThreads?

optional maxThreads: number;

The maximum number of threads to use for the Llama instance.

Set to 0 to have no thread limit.

When not using a GPU, defaults to the number of CPU cores that are useful for math (.cpuMathCores), or 4, whichever is higher.

When using a GPU, there's no limit by default.

vramPadding?

optional vramPadding: number | (totalVram: number) => number;

Pad the available VRAM for the memory size calculations, as these calculations are not always accurate. Recommended to ensure stability. This only affects the calculations of "auto" in function options and is not reflected in the getVramState function.

Defaults to 6% of the total VRAM or 1GB, whichever is lower. Set to 0 to disable.

ramPadding?

optional ramPadding: number | (totalRam: number) => number;

Pad the available RAM for the memory size calculations, as these calculations are not always accurate. Recommended to ensure stability.

Defaults to 25% of the total RAM or 6GB (1GB on Linux), whichever is lower. Set to 0 to disable.

Since the OS also needs RAM to function, the default value can get up to 6GB on Windows and macOS, and 1GB on Linux.

debug?

optional debug: boolean;

Enable debug mode to find issues with llama.cpp. Makes logs print directly to the console from llama.cpp and not through the provided logger.

Defaults to false.

The default can be set using the NODE_LLAMA_CPP_DEBUG environment variable.

Defined in

bindings/getLlama.ts:36

LlamaModel

LlamaModelTokens

LlamaChatSession

LlamaText

GgufInsights

GbnfJsonSchema

ChatHistoryItem

ChatModelResponse

LlamaChatResponse

GgufFileInfo

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

Chat Wrapper Options

JinjaTemplateChatWrapperOptions

Type Alias: LlamaOptions

Type declaration

gpu?

logLevel?

logger()?

Parameters

Returns

build?

cmakeOptions?

existingPrebuiltBinaryMustMatchBuildOptions?

usePrebuiltBinaries?

progressLogs?

skipDownload?

maxThreads?

vramPadding?

ramPadding?

debug?

Defined in

LlamaModelTokens

ChatModelResponse

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

JinjaTemplateChatWrapperOptions

Type Alias: LlamaOptions ​

Type declaration ​

gpu? ​

logLevel? ​

logger()? ​

Parameters ​

Returns ​

build? ​

cmakeOptions? ​

existingPrebuiltBinaryMustMatchBuildOptions? ​

usePrebuiltBinaries? ​

progressLogs? ​

skipDownload? ​

maxThreads? ​

vramPadding? ​

ramPadding? ​

debug? ​

Defined in ​

Type Alias: LlamaOptions

Type declaration

gpu?

logLevel?

logger()?

Parameters

Returns

build?

cmakeOptions?

existingPrebuiltBinaryMustMatchBuildOptions?

usePrebuiltBinaries?

progressLogs?

skipDownload?

maxThreads?

vramPadding?

ramPadding?

debug?

Defined in