Type Alias: LastBuildOptions

type LastBuildOptions = {
  logLevel?: LlamaLogLevel;
  logger?: (level: LlamaLogLevel, message: string) => void;
  usePrebuiltBinaries?: boolean;
  progressLogs?: boolean;
  skipDownload?: boolean;
  maxThreads?: number;
  vramPadding?: number | (totalVram: number) => number;
  ramPadding?: number | (totalRam: number) => number;
  debug?: boolean;
  dryRun?: boolean;
  numa?: LlamaNuma;
};

Defined in: bindings/getLlama.ts:197

Properties

logLevel?

optional logLevel: LlamaLogLevel;

Defined in: bindings/getLlama.ts:202

Set the minimum log level for llama.cpp. Defaults to "warn".

logger()?

optional logger: (level: LlamaLogLevel, message: string) => void;

Defined in: bindings/getLlama.ts:207

Set a custom logger for llama.cpp logs.

Parameters

Parameter	Type
`level`	`LlamaLogLevel`
`message`	`string`

Returns

void

usePrebuiltBinaries?

optional usePrebuiltBinaries: boolean;

Defined in: bindings/getLlama.ts:213

If a local build is not found, use prebuilt binaries. Enabled by default.

progressLogs?

optional progressLogs: boolean;

Defined in: bindings/getLlama.ts:220

If a local build is not found, and prebuilt binaries are not found, when building from source, print binary compilation progress logs. Enabled by default.

skipDownload?

optional skipDownload: boolean;

Defined in: bindings/getLlama.ts:227

If a local build is not found, and prebuilt binaries are not found, don't download llama.cpp source if it's not found. When set to true, and llama.cpp source is needed but is not found, a NoBinaryFoundError error will be thrown. Disabled by default.

maxThreads?

optional maxThreads: number;

Defined in: bindings/getLlama.ts:238

The maximum number of threads to use for the Llama instance.

Set to 0 to have no thread limit.

When not using a GPU, defaults to the number of CPU cores that are useful for math (.cpuMathCores), or 4, whichever is higher.

When using a GPU, there's no limit by default.

vramPadding?

optional vramPadding: number | (totalVram: number) => number;

Defined in: bindings/getLlama.ts:248

Pad the available VRAM for the memory size calculations, as these calculations are not always accurate. Recommended to ensure stability. This only affects the calculations of "auto" in function options and is not reflected in the getVramState function.

Defaults to 6% of the total VRAM or 1GB, whichever is lower. Set to 0 to disable.

ramPadding?

optional ramPadding: number | (totalRam: number) => number;

Defined in: bindings/getLlama.ts:259

Pad the available RAM for the memory size calculations, as these calculations are not always accurate. Recommended to ensure stability.

Defaults to 25% of the total RAM or 6GB (1GB on Linux), whichever is lower. Set to 0 to disable.

Since the OS also needs RAM to function, the default value can get up to 6GB on Windows and macOS, and 1GB on Linux.

debug?

optional debug: boolean;

Defined in: bindings/getLlama.ts:269

Enable debug mode to find issues with llama.cpp. Makes logs print directly to the console from llama.cpp and not through the provided logger.

Defaults to false.

The default can be set using the NODE_LLAMA_CPP_DEBUG environment variable.

dryRun?

optional dryRun: boolean;

Defined in: bindings/getLlama.ts:284

Loads existing binaries without loading the llama.cpp backend, and then disposes the returned Llama instance right away before returning it.

Useful for performing a fast and efficient test to check whether the given configuration can be loaded. Can be used for determining which GPU types the current machine supports before actually using them.

Enabling this option implies that build: "never" and skipDownload: true.

The returned Llama instance will be disposed and cannot be used.

Defaults to false.

numa?

optional numa: LlamaNuma;

Defined in: bindings/getLlama.ts:304

NUMA (Non-Uniform Memory Access) allocation policy.

On multi-socket or multi-cluster machines, each CPU "socket" (or node) has its own local memory. Accessing memory on your own socket is fast, but another socket's memory is slower. Setting a NUMA (Non-Uniform Memory Access) allocation policy can dramatically improve performance by keeping data local and "close" to the socket.

These are the available NUMA options:

false: Don't set any NUMA policy - let the OS decide.
"distribute": Distribute the memory across all available NUMA nodes.
"isolate": Pin both threads and their memory to a single NUMA node to avoid cross-node traffic.
"numactl": Delegate NUMA management to the external numactl command (or libnuma library) to set the NUMA policy.
"mirror": Allocate memory on all NUMA nodes, and copy the data to all of them. This ensures minimal traffic between nodes, but uses more memory.

Defaults to false (no NUMA policy).

LlamaModel

LlamaModelTokens

LlamaChatSession

LlamaText

GgufInsights

GbnfJsonSchema

ChatHistoryItem

ChatModelResponse

LlamaChatResponse

GgufFileInfo

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

Chat Wrapper Options

JinjaTemplateChatWrapperOptions

Type Alias: LastBuildOptions

Properties

logLevel?

logger()?

Parameters

Returns

usePrebuiltBinaries?

progressLogs?

skipDownload?

maxThreads?

vramPadding?

ramPadding?

debug?

dryRun?

numa?

LlamaModelTokens

ChatModelResponse

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

JinjaTemplateChatWrapperOptions

Type Alias: LastBuildOptions ​

Properties ​

logLevel? ​

logger()? ​

Parameters ​

Returns ​

usePrebuiltBinaries? ​

progressLogs? ​

skipDownload? ​

maxThreads? ​

vramPadding? ​

ramPadding? ​

debug? ​

dryRun? ​

numa? ​

Type Alias: LastBuildOptions

Properties

logLevel?

logger()?

Parameters

Returns

usePrebuiltBinaries?

progressLogs?

skipDownload?

maxThreads?

vramPadding?

ramPadding?

debug?

dryRun?

numa?