Chat Context Shift Strategy
When the chat history gets longer than the sequence's context size, we have to remove the oldest tokens from the context state to make room for new tokens to be generated. This is called a context shift.
node-llama-cpp
has a smart mechanism to handle context shifts on the chat level, so the oldest messages are truncated (from their beginning) or removed from the context state, while keeping the system prompt in place to ensure the model follows the guidelines you set for it.
You can override node-llama-cpp
's default context shift strategy when using LlamaChatSession
or LlamaChat
by providing a custom context shift strategy.
The Default Context Shift Strategy
The default context shift strategy is eraseFirstResponseAndKeepFirstSystem
.
This strategy attempts to truncate the oldest model responses (from their beginning) or remove them completely from the chat history while keeping the first system prompt in place. If a response is completely removed, the prompt that came before it will be removed as well.
Implementing a Custom Context Shift Strategy
A custom context shift strategy is a function that receives the full chat history as input and returns a new chat history that when tokenized will result in an array of tokens shorter than the desired max size.
The context shift strategy will be called only when the context state needs to be shifted.
If the context shift strategy returns an invalid chat history (e.g., a chat history that is too long), the prompting function will abort the evaluation and throw an error.
A custom context shift strategy can be a simple logic that prioritizes which data to remove, or it can even use a language model to summarize information to shorten the chat history.
It's important to keep the last user prompt and model response as-is to prevent infinite generation loops.
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
contextShift: {
strategy({
chatHistory, chatWrapper, maxTokensCount, tokenizer,
lastShiftMetadata
}) {
// clone the chat history to not mutate the original
const newChatHistory = chatHistory.map(
(item) => structuredClone(item)
);
function getTokensLeftToRemove() {
const {
contextText
} = chatWrapper.generateContextState({chatHistory});
const tokenUsage = contextText.tokenize(tokenizer).length;
return Math.max(0, tokenUsage - maxTokensCount);
}
while (getTokensLeftToRemove() > 0 && newChatHistory.length > 2) {
for (let i = 0; i < newChatHistory.length - 2; i++) {
const chatItem = newChatHistory[i]!;
if (i === 0 && chatItem.type === "system")
// don't remove the first system message
continue;
else if (chatItem.type === "model") {
// remove the model response
newChatHistory.splice(i, 1);
i--;
// remove the user messages that
// came before the model response
while (
i > 0 &&
newChatHistory[i - 1]?.type === "user"
) {
newChatHistory.splice(i - 1, 1);
i--;
}
} else if (chatItem.type === "system") {
// don't remove system messages on their own
continue;
} else if (chatItem.type === "user") {
// don't remove user messages on their own
continue;
} else {
// ensure we handle all message types.
// otherwise, this will error
void (chatItem satisfies never);
}
}
}
return {
chatHistory: newChatHistory,
// this metadata will be passed to the next context shift
// strategy call as the `lastShiftMetadata` argument
metadata: {}
};
}
}
});