Skip to content

Chat Context Shift Strategy

When the chat history gets longer than the sequence's context size, we have to remove the oldest tokens from the context state to make room for new tokens to be generated. This is called a context shift.

node-llama-cpp has a smart mechanism to handle context shifts on the chat level, so the oldest messages are truncated (from their beginning) or removed from the context state, while keeping the system prompt in place to ensure the model follows the guidelines you set for it.

You can override node-llama-cpp's default context shift strategy when using LlamaChatSession or LlamaChat by providing a custom context shift strategy.

The Default Context Shift Strategy

The default context shift strategy is eraseFirstResponseAndKeepFirstSystem.

This strategy attempts to truncate the oldest model responses (from their beginning) or remove them completely from the chat history while keeping the first system prompt in place. If a response is completely removed, the prompt that came before it will be removed as well.

Implementing a Custom Context Shift Strategy

A custom context shift strategy is a function that receives the full chat history as input and returns a new chat history that when tokenized will result in an array of tokens shorter than the desired max size.

The context shift strategy will be called only when the context state needs to be shifted.

If the context shift strategy returns an invalid chat history (e.g., a chat history that is too long), the prompting function will abort the evaluation and throw an error.

A custom context shift strategy can be a simple logic that prioritizes which data to remove, or it can even use a language model to summarize information to shorten the chat history.

It's important to keep the last user prompt and model response as-is to prevent infinite generation loops.

typescript
const 
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
(),
contextShift
: {
strategy
({
chatHistory
,
chatWrapper
,
maxTokensCount
,
tokenizer
,
lastShiftMetadata
}) { // clone the chat history to not mutate the original const
newChatHistory
=
chatHistory
.
map
(
(
item
) =>
structuredClone
(
item
)
); function
getTokensLeftToRemove
() {
const {
contextText
} =
chatWrapper
.
generateContextState
({
chatHistory
});
const
tokenUsage
=
contextText
.
tokenize
(
tokenizer
).
length
;
return
Math
.
max
(0,
tokenUsage
-
maxTokensCount
);
} while (
getTokensLeftToRemove
() > 0 &&
newChatHistory
.
length
> 2) {
for (let
i
= 0;
i
<
newChatHistory
.
length
- 2;
i
++) {
const
chatItem
=
newChatHistory
[
i
]!;
if (
i
=== 0 &&
chatItem
.
type
=== "system")
// don't remove the first system message continue; else if (
chatItem
.
type
=== "model") {
// remove the model response
newChatHistory
.
splice
(
i
, 1);
i
--;
// remove the user messages that // came before the model response while (
i
> 0 &&
newChatHistory
[
i
- 1]?.
type
=== "user"
) {
newChatHistory
.
splice
(
i
- 1, 1);
i
--;
} } else if (
chatItem
.
type
=== "system") {
// don't remove system messages on their own continue; } else if (
chatItem
.
type
=== "user") {
// don't remove user messages on their own continue; } else { // ensure we handle all message types. // otherwise, this will error void (
chatItem
satisfies never);
} } } return {
chatHistory
:
newChatHistory
,
// this metadata will be passed to the next context shift // strategy call as the `lastShiftMetadata` argument
metadata
: {}
}; } } });