Skip to content

Objects Lifecycle

Every object in node-llama-cpp has a .dispose() function you can call to free up its resources.

Calling the .dispose() function on an object also disposes all of its dependant objects.

For example, calling .dispose() on a model automatically disposes all of its contexts:

typescript
const 
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
});
const
context
= await
model
.
createContext
();
await
model
.
dispose
();
console
.
log
("Context disposed:",
context
.
disposed
); // true

You cannot use a disposed object after disposing it.

Attempting to create a context from a disposed model will throw a DisposedError, attempting to evaluate input on a disposed context sequence will also throw a DisposedError, etc.

To automatically dispose an object when it goes out of scope, you can use await using in TypeScript (TypeScript 5.2 or later):

typescript
const 
llama
= await
getLlama
();
let
context
:
LlamaContext
| undefined;
async function
doThings
() {
await using
model
= await
llama
.
loadModel
({
modelPath
});
context
= await
model
.
createContext
();
} await
doThings
();
// the model is disposed when the `doThings` function is done, // and so are its contexts
console
.
log
("Context disposed:",
context
?.
disposed
); // true

Garbage Collection

If you forget to dispose an object, it will automatically be disposed when the garbage collector runs.

It's best to dispose objects yourself to free up resources as soon as you're done with them, so you can allocate new resources sooner when needed. Disposing objects yourself can make a big difference in what you can do with the resources you have available, especially since models and contexts use a lot of VRAM.

Llama Instances

Every call to getLlama creates a new instance of Llama that allocates its own resources, so it's best to create a single instance and reuse it throughout your entire application.

You can do so by creating a llama.ts file and exporting the instance from there:

typescript
import {
getLlama
} from "node-llama-cpp";
export const
llama
= await
getLlama
();
typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
llama
} from "./llama.js";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
modelPath
=
path
.
join
(
__dirname
, "my-model.gguf");
const
model
= await
llama
.
loadModel
({
modelPath
});
typescript
import {
llama
} from "./llama.js";
export async function
logVramState
() {
const
vramState
= await
llama
.
getVramState
();
console
.
log
("Used VRAM:",
vramState
.
used
);
console
.
log
("Free VRAM:",
vramState
.
free
);
}

Reusing Existing Context Sequence State

When prompting a model using LlamaChatSession or LlamaChat, it attempts to use the existing context sequence state as much as possible to avoid redundant evaluations, but when needed, it'll flush irrelevant parts of the state (or all of it) to perform the requested evaluation.

You can reuse a context sequence for a new LlamaChatSession or LlamaChat without worrying about data leakage between different chat sessions.

You'll probably want to do so to utilize the existing state for faster evaluation using the new chat, since the preamble system prompt and other chat history items may have already been evaluated in the existing context sequence, so reusing the context sequence for a new chat will allow it to automatically continue evaluation from the first difference in the existing state, thus reducing the time needed to start generating output.

WARNING

It's important to make sure you don't use the same context sequence for multiple chats at the same time, as it'll cause the chats to compete for the same resources and may lead to unexpected results.

Always make sure you're done with the existing chat before reusing the context sequence for a new chat.

Objects Relationship

Llama

The main class returned by the getLlama() method that provides access to llama.cpp APIs as well as additional native APIs.

LlamaModel

A model loaded using the .loadModel() method of a Llama instance.

LlamaContext

A context created using the .createContext() method of a LlamaModel instance.

A context can hold multiple context sequences.

Having multiple context sequences is more efficient and performant than creating multiple contexts, and allows using batching.

LlamaContextSequence

A context sequence created using the .createContextSequence() method of a LlamaContext instance.

A context sequence holds a state (usually tokens) of the conversation and is used to generate completions and evaluate inputs.

All context sequences are independent of each other and do not share data between them.

LlamaChatSession

A chat session created with a LlamaContextSequence instance.

A chat session is used to prompt a model with a conversation history and generate responses.

The existing state of the context sequence will be overridden if it cannot be reused for the chat session. You don't need to provide a clean context sequence for a LlamaChatSession to work as expected.