Skip to content

Chat Wrapper

Background

Text generation models are trained to predict the completion of incomplete text. To have a conversation with a model, we have to generate a text the model can complete, and parse its response to know whether it finished answering, or should we tell it to continue completing the text.

For example, to prompt a model with "Where do llamas come from?" we can give the model a text like this to predict the completion of it:

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something incorrectly.
If you don't know the answer to a question, don't share false information.

### Human
Where do llamas come from?

### Assistant

The first text we gave to the model in this example is called a "system prompt". This text will guide the model towards generating a response we want it to generate.

The model will then generate a response like this:

Llamas come from the Andes mountains.

### Human

On every character the model generates, we have to check whether the text completion now includes the ### Human\n part, and if it does, we can stop the completion and return the response.

Most models are trained to understand a specific conversation format, or output a specific text when they finish generating a response.

Usually, when a model finishes generating a response, it'll output an EOS token (End of Sequence token) that's specific to the model.

For example, LLama 3 Instruct models have their own conversation format.

INFO

To learn more about tokens, see the tokens guide

Chat Wrappers

The LlamaChatSession class allows you to chat with a model without having to worry about any parsing or formatting.

To do that, it uses a chat wrapper to handle the unique chat format of the model you use.

It automatically selects and configures a chat wrapper that it thinks is best for the model you use (via resolveChatWrapper(...)).

You can also specify a specific chat wrapper to only use it, or to customize its settings. For example, to chat with a LLama 3 Instruct model, you can use Llama3ChatWrapper:

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
,
Llama3ChatWrapper
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
(),
chatWrapper
: new
Llama3ChatWrapper
() // by default, "auto" is used
}); const
q1
= "Hi there, how are you?";
console
.
log
("User: " +
q1
);
const
a1
= await
session
.
prompt
(
q1
);
console
.
log
("AI: " +
a1
);
const
q2
= "Summarize what you said";
console
.
log
("User: " +
q2
);
const
a2
= await
session
.
prompt
(
q2
);
console
.
log
("AI: " +
a2
);

You can find the list of builtin chat prompt wrappers here.

Template Chat Wrapper

A simple way to create your own custom chat wrapper is to use TemplateChatWrapper.

Example usage:

typescript
import {
TemplateChatWrapper
} from "node-llama-cpp";
const
chatWrapper
= new
TemplateChatWrapper
({
template
: "{{systemPrompt}}\n{{history}}model: {{completion}}\nuser: ",
historyTemplate
: {
system
: "system: {{message}}\n",
user
: "user: {{message}}\n",
model
: "model: {{message}}\n"
}, // functionCallMessageTemplate: { // optional // call: "[[call: {{functionName}}({{functionParams}})]]", // result: " [[result: {{functionCallResult}}]]" // } });

See TemplateChatWrapper for more details.

Jinja Template Chat Wrapper

To reuse an existing Jinja template you have, you can use JinjaTemplateChatWrapper.

NOTE

Not all the features of Jinja are supported by the JinjaTemplateChatWrapper, so some Jinja templates might need some simple modifications to work.

If you'd like to create your own chat wrapper, it's significantly easier to write you own custom chat wrapper directly.

typescript
import {
JinjaTemplateChatWrapper
} from "node-llama-cpp";
const
chatWrapper
= new
JinjaTemplateChatWrapper
({
template
: "<Jinja template here>",
// functionCallMessageTemplate: { // optional // call: "[[call: {{functionName}}({{functionParams}})]]", // result: " [[result: {{functionCallResult}}]]" // } });

Custom Chat Wrapper

To create your own chat wrapper, you need to extend the ChatWrapper class.

The way a chat wrapper works is that it implements the generateContextState method, which received the full chat history and available functions and is responsible for generating the content to be loaded into the context state, so the model can generate a completion of it.

The context content is returned in the form of a LlamaText (see the LlamaText guide).

If the last message in the chat history is a model response, it must not include a syntax suffix for the message, so the model can continue generating completion for an existing response. This is needed for context shifts to work properly.

For example, this is a valid ending of a context text:

text
### Assistant
Llamas come from the

This is an invalid ending of a context text:

text
### Assistant
Llamas come from the

### Human

What is a context shift?

When the chat history gets longer than the sequence's context size, we have to remove the oldest tokens from the context state to make room for new tokens to be generated.

node-llama-cpp has a smart mechanism to handle context shifts on the chat level, so the oldest messages are truncated (from their beginning) or removed from the context state, while keeping the system prompt in place to ensure the model follows the guidelines you set for it.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
,
ChatWrapper
,
ChatWrapperSettings
,
ChatWrapperGenerateContextStateOptions
,
ChatWrapperGeneratedContextState
,
LlamaText
} from "node-llama-cpp"; const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
class
MyCustomChatWrapper
extends
ChatWrapper
{
public readonly
wrapperName
: string = "MyCustomChat";
public override readonly
settings
:
ChatWrapperSettings
= {
...
ChatWrapper
.
defaultSettings
}; public override
generateContextState
({
chatHistory
,
availableFunctions
,
documentFunctionParams
}:
ChatWrapperGenerateContextStateOptions
):
ChatWrapperGeneratedContextState
{
const
historyWithFunctions
= this.
addAvailableFunctionsSystemMessageToHistory
(
chatHistory
,
availableFunctions
, {
documentParams
:
documentFunctionParams
}); const
texts
=
historyWithFunctions
.
map
((
item
,
index
) => {
if (
item
.
type
=== "system") {
if (
index
=== 0)
return
LlamaText
([
LlamaText
.
fromJSON
(
item
.
text
)
]); return
LlamaText
([
"### System\n",
LlamaText
.
fromJSON
(
item
.
text
)
]); } else if (
item
.
type
=== "user")
return
LlamaText
([
"### Human\n",
item
.
text
]); else if (
item
.
type
=== "model")
return
LlamaText
([
"### Assistant\n", this.
generateModelResponseText
(
item
.
response
)
]); // ensure that all chat item types are handled, // or TypeScript will throw an error return
item
satisfies never;
}); return {
contextText
:
LlamaText
.
joinValues
("\n\n",
texts
),
// if the model generates any of these texts, // the completion will stop, and the text will not // be included in the response returned to the user
stopGenerationTriggers
: [
LlamaText
(["### Human\n"])
] }; } } const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
(),
chatWrapper
: new
MyCustomChatWrapper
()
}); const
q1
= "Hi there, how are you?";
console
.
log
("User: " +
q1
);
const
a1
= await
session
.
prompt
(
q1
);
console
.
log
("AI: " +
a1
);
const
q2
= "Summarize what you said";
console
.
log
("User: " +
q2
);
const
a2
= await
session
.
prompt
(
q2
);
console
.
log
("AI: " +
a2
);