Objects Lifecycle
Every object in node-llama-cpp
has a .dispose()
function you can call to free up its resources.
Calling the .dispose()
function on an object also disposes all of its dependant objects.
For example, calling .dispose()
on a model automatically disposes all of its contexts:
const llama = await getLlama();
const model = await llama.loadModel({modelPath});
const context = await model.createContext();
await model.dispose();
console.log("Context disposed:", context.disposed); // true
You cannot use a disposed object after disposing it.
Attempting to create a context from a disposed model will throw a
DisposedError
, attempting to evaluate input on a disposed context sequence will also throw aDisposedError
, etc.
To automatically dispose an object when it goes out of scope, you can use await using
in TypeScript (TypeScript 5.2 or later):
const llama = await getLlama();
let context: LlamaContext | undefined;
async function doThings() {
await using model = await llama.loadModel({modelPath});
context = await model.createContext();
}
await doThings();
// the model is disposed when the `doThings` function is done,
// and so are its contexts
console.log("Context disposed:", context?.disposed); // true
Garbage Collection
If you forget to dispose an object, it will automatically be disposed when the garbage collector runs.
It's best to dispose objects yourself to free up resources as soon as you're done with them, so you can allocate new resources sooner when needed. Disposing objects yourself can make a big difference in what you can do with the resources you have available, especially since models and contexts use a lot of VRAM.
Llama Instances
Every call to getLlama
creates a new instance of Llama
that allocates its own resources, so it's best to create a single instance and reuse it throughout your entire application.
You can do so by creating a llama.ts
file and exporting the instance from there:
import {getLlama} from "node-llama-cpp";
export const llama = await getLlama();
import {fileURLToPath} from "url";
import path from "path";
import {llama} from "./llama.js";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const modelPath = path.join(__dirname, "my-model.gguf");
const model = await llama.loadModel({modelPath});
import {llama} from "./llama.js";
export async function logVramState() {
const vramState = await llama.getVramState();
console.log("Used VRAM:", vramState.used);
console.log("Free VRAM:", vramState.free);
}
Reusing Existing Context Sequence State
When prompting a model using LlamaChatSession
or LlamaChat
, it attempts to use the existing context sequence state as much as possible to avoid redundant evaluations, but when needed, it'll flush irrelevant parts of the state (or all of it) to perform the requested evaluation.
You can reuse a context sequence for a new LlamaChatSession
or LlamaChat
without worrying about data leakage between different chat sessions.
You'll probably want to do so to utilize the existing state for faster evaluation using the new chat, since the preamble system prompt and other chat history items may have already been evaluated in the existing context sequence, so reusing the context sequence for a new chat will allow it to automatically continue evaluation from the first difference in the existing state, thus reducing the time needed to start generating output.
WARNING
It's important to make sure you don't use the same context sequence for multiple chats at the same time, as it'll cause the chats to compete for the same resources and may lead to unexpected results.
Always make sure you're done with the existing chat before reusing the context sequence for a new chat.
Objects Relationship
Llama
The main class returned by the getLlama()
method that provides access to llama.cpp
APIs as well as additional native APIs.
LlamaModel
A model loaded using the .loadModel()
method of a Llama
instance.
LlamaContext
A context created using the .createContext()
method of a LlamaModel
instance.
A context can hold multiple context sequences.
Having multiple context sequences is more efficient and performant than creating multiple contexts, and allows using batching.
LlamaContextSequence
A context sequence created using the .createContextSequence()
method of a LlamaContext
instance.
A context sequence holds a state (usually tokens) of the conversation and is used to generate completions and evaluate inputs.
All context sequences are independent of each other and do not share data between them.
LlamaChatSession
A chat session created with a LlamaContextSequence
instance.
A chat session is used to prompt a model with a conversation history and generate responses.
The existing state of the context sequence will be overridden if it cannot be reused for the chat session. You don't need to provide a clean context sequence for a LlamaChatSession
to work as expected.