Skip to content

External Chat State

WARNING

If you're not building a library around node-llama-cpp, you'd probably want to use the simpler LlamaChatSession; read more on the chat session documentation.

You can save and restore a chat history on LlamaChatSession instead of managing the chat state externally.

To interact with a model in a chat form, you can use LlamaChatSession, which is stateful chat session that manages the chat state on its own.

When building a library around node-llama-cpp, you may want to store that chat state externally and control the evaluations on your own.

This is where LlamaChat may come in handy. LlamaChat Allows you to generate a completion to an existing chat session and manage the evaluation yourself, which allows you to also store the chat state externally. LlamaChat is stateless and has no state of its own.

In fact, LlamaChatSession is just a wrapper around LlamaChat to make it more convenient to use.

Let's see how you can use LlamaChat to prompt a model:

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChat
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf"
) }); const
context
= await
model
.
createContext
();
const
llamaChat
= new
LlamaChat
({
contextSequence
:
context
.
getSequence
()
}); let
chatHistory
=
llamaChat
.
chatWrapper
.
generateInitialChatHistory
();
const
prompt
= "Hi there, how are you?";
// add the user prompt to the chat history
chatHistory
.
push
({
type
: "user",
text
:
prompt
}); // add a slot for the model response, for the model to complete. // if we want the model response to start with a specific text, // we can do so by adding it to the response array
chatHistory
.
push
({
type
: "model",
response
: []
});
console
.
log
("User: " +
prompt
);
const
res
= await
llamaChat
.
generateResponse
(
chatHistory
, {
onTextChunk
(
text
) {
// stream the text to the console
process
.
stdout
.
write
(
text
);
} });
console
.
log
("AI: " +
res
.
response
);

Now, let say we want to ask the model a follow-up question based on the previous response. Since we already have a context sequence loaded with the previous chat history, we'd want to use it as much a possible.

To do so, we pass the context window of the previous evaluation output to the new evaluation. This is important, since if a context shift has happened, we want to use the existing post-context-shift context sequence state as much as possible instead of starting from scratch.

NOTE

Keeping and passing the context window and context shift metadata is only necessary if you use the same context sequence in the next evaluation, and the state from the previous evaluation is still present in the context sequence.

typescript
chatHistory
=
res
.
lastEvaluation
.
cleanHistory
;
let
chatHistoryContextWindow
=
res
.
lastEvaluation
.
contextWindow
;
let
lastContextShiftMetadata
=
res
.
lastEvaluation
.
contextShiftMetadata
;
const
prompt2
= "Summarize what you said";
// add the user prompt to the chat history
chatHistory
.
push
({
type
: "user",
text
:
prompt2
}); // add the user prompt to the chat history context window
chatHistoryContextWindow
.
push
({
type
: "user",
text
:
prompt2
}); // add a slot for the model response, for the model to complete
chatHistory
.
push
({
type
: "model",
response
: []
}); // add a slot for the model response in the context window
chatHistoryContextWindow
.
push
({
type
: "model",
response
: []
});
console
.
log
("User: " +
prompt2
);
const
res2
= await
llamaChat
.
generateResponse
(
chatHistory
, {
onTextChunk
(
text
) {
// stream the text to the console
process
.
stdout
.
write
(
text
);
},
contextShift
: {
// pass the context shift metadata from the previous evaluation
lastEvaluationMetadata
:
lastContextShiftMetadata
},
lastEvaluationContextWindow
: {
history
:
chatHistoryContextWindow
}, });
console
.
log
("AI: " +
res2
.
response
);

Handling Function Calling

When passing information about functions the model can call, the response of the .generateResponse() can contain function calls.

Then, it's our implementation's responsibility to:

  • Print the textual response the model generated
  • Perform the appropriate function calls
  • Add the function calls and their results to the chat history

Here's an example of how we can prompt a model and support function calling:

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChat
,
ChatModelFunctions
,
ChatHistoryItem
,
ChatModelResponse
,
ChatModelFunctionCall
} from "node-llama-cpp"; const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf"
) }); const
context
= await
model
.
createContext
();
const
llamaChat
= new
LlamaChat
({
contextSequence
:
context
.
getSequence
()
}); let
chatHistory
=
llamaChat
.
chatWrapper
.
generateInitialChatHistory
();
const
prompt
= "Give me the result of 2 dice rolls";
const
functionDefinitions
= {
getRandomNumber
: {
description
: "Get a random number",
params
: {
type
: "object",
properties
: {
min
: {
type
: "number"
},
max
: {
type
: "number"
} } } } } satisfies
ChatModelFunctions
;
function
getRandomNumber
(
params
: {
min
: number,
max
: number}) {
return
Math
.
floor
(
(
Math
.
random
() * (
params
.
max
-
params
.
min
+ 1)) +
params
.
min
); } // add the user prompt to the chat history
chatHistory
.
push
({
type
: "user",
text
:
prompt
}); // add a slot for the model response, for the model to complete. // if we want the model response to start with a specific text, // we can do so by adding it to the response array
chatHistory
.
push
({
type
: "model",
response
: []
});
console
.
log
("User: " +
prompt
);
let
chatHistoryContextWindow
:
ChatHistoryItem
[] | undefined;
let
lastContextShiftMetadata
: any;
while (true) { const
res
= await
llamaChat
.
generateResponse
(
chatHistory
, {
functions
:
functionDefinitions
,
onFunctionCall
(
functionCall
) {
// we can use this callback to start performing // the function as soon as the model calls it
console
.
log
(
"model called function",
functionCall
.
functionName
,
"with params",
functionCall
.
params
); },
contextShift
: {
lastEvaluationMetadata
:
lastContextShiftMetadata
},
lastEvaluationContextWindow
: {
history
:
chatHistoryContextWindow
}, });
chatHistory
=
res
.
lastEvaluation
.
cleanHistory
;
chatHistoryContextWindow
=
res
.
lastEvaluation
.
contextWindow
;
lastContextShiftMetadata
=
res
.
lastEvaluation
.
contextShiftMetadata
;
// print the text the model generated before calling functions if (
res
.
response
!== "")
console
.
log
("AI: " +
res
.
response
);
// when there are no function calls, // it means the model has finished generating the response if (
res
.
functionCalls
== null)
break; // perform the function calls const
callItems
:
ChatModelFunctionCall
[] =
res
.
functionCalls
.
map
((
functionCall
) => {
if (
functionCall
.
functionName
!== "getRandomNumber")
throw new
Error
("only function getRandomNumber is supported");
const
res
=
getRandomNumber
(
functionCall
.
params
);
console
.
log
(
"Responding to function",
functionCall
.
functionName
,
"with params",
functionCall
.
params
,
"with result",
res
); const
functionDefinition
=
functionDefinitions
[
functionCall
.
functionName
];
return {
type
: "functionCall",
name
:
functionCall
.
functionName
,
params
:
functionCall
.
params
,
rawCall
:
functionCall
.
raw
,
description
:
functionDefinition
?.
description
,
result
:
res
} satisfies
ChatModelFunctionCall
;
}); // needed for maintaining the existing context sequence state // with parallel function calling, // and avoiding redundant context shifts
callItems
[0]!.
startsNewChunk
= true;
if (
chatHistory
.
at
(-1)?.
type
!== "model")
chatHistory
.
push
({
type
: "model",
response
: []
}); if (
chatHistoryContextWindow
.
at
(-1)?.
type
!== "model")
chatHistoryContextWindow
.
push
({
type
: "model",
response
: []
}); const
modelResponse
=
chatHistory
.
at
(-1)! as
ChatModelResponse
;
const
contextWindowModelResponse
=
chatHistoryContextWindow
.
at
(-1)! as
ChatModelResponse
;
// add the function calls and their results // both to the chat history and the context window chat history for (const
callItem
of
callItems
) {
modelResponse
.
response
.
push
(
callItem
);
contextWindowModelResponse
.
response
.
push
(
callItem
);
} }