Using Function Calling
When prompting a model using a LlamaChatSession
, you can provide a list of functions that a model can call during generation to retrieve information or perform actions.
For this to work, node-llama-cpp
tells the model what functions are available and what parameters they take, and instructs it to call those as needed. It also ensures that the model can only call functions with the correct parameters.
Some models have built-in support for function calling, and some of them are not trained for that.
For example, Llama 3 is not trained for function calling. When using a Llama 3 model, the Llama3ChatWrapper
is automatically used, and it includes a custom handling for function calling, which contains a fine-tuned instruction for explaining the model how to call functions and when to do so.
There are also model that do have built-in support for function calling, like Llama 3.1. When using a Llama 3.1 model, the Llama3_1ChatWrapper
is automatically used, and it knows how to handle function calling for this model.
In order for the model to know what functions can do and what they return, you need to provide this information in the function description.
Let's see an example of how to use function calling with a Llama 3.1 model:
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession, defineChatSessionFunction} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "Meta-Llama-3.1-8B.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const fruitPrices: Record<string, string> = {
"apple": "$6",
"banana": "$4"
};
const functions = {
getFruitPrice: defineChatSessionFunction({
description: "Get the price of a fruit",
params: {
type: "object",
properties: {
name: {
type: "string"
}
}
},
async handler(params) {
const name = params.name.toLowerCase();
if (Object.keys(fruitPrices).includes(name))
return {
name: name,
price: fruitPrices[name]
};
return `Unrecognized fruit "${params.name}"`;
}
})
};
const q1 = "Is an apple more expensive than a banana?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
In this example, you can see that we have a function called getFruitPrice
that returns the price of a fruit. This function has a description that explains what it does and what it returns.
The params
schema ensure that the model can only call this function with the correct parameters, and is also used to inform the model what parameters this function takes, so there's no need to provide this information again as part of the function description or prompt.
It's important, though, to make sure that the parameter names are clear and easy to understand, so the model can use them correctly. It's okay for parameters to be very long, as long as they're self-explanatory.
We return the fruit name that the model asked for in the response. When processing the response, some models don't properly match the response of a function call with the function call parameters when multiple function calls are being made in parallel, so providing the context as part of the response itself helps the model understand the context better. This may not be necessary for the model you use, but can be helpful in some cases.
When we encounter an error, like an unrecognized fruit, we have to communicate it to the model in a way that it can understand, so we return a text response explaining what went wrong. Throwing an error will just abort the generation, so avoid doing that if you want the generation to continue.
Function Parameters
All the parameters passed to a function are considered required by the schema. This is intentional because many models struggle to use optional parameters effectively.
The generation process works like this: the model is provided with an existing state and is tasked with generating a completion to that state. Each generation depends on the previous one, requiring alignment with the existing state. The model must pass the parameters in the order they are defined, but it may not always be aware of all the possible parameters. As a result, after a parameter value is generated, the next parameter is "forced" on the model, requiring the model to generate its value. This method ensures that the model adheres to the schema, even if it doesn't fully comprehend it.
Optional properties can introduce unpredictability. Whether the model decides to generate an optional property or is forced to do so can be random, leading to inconsistent results.
To address cases involving optional values, it is recommended to use oneOf
. This allows the model to either set the property to null
or assign it a value, ensuring that the model deliberately chooses the outcome rather than leaving it to chance.
Let's see an example of how to use oneOf
to handle an optional parameter:
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession, defineChatSessionFunction} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "Meta-Llama-3.1-8B.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const fruitPrices: Record<string, {USD: number, EUR: number}> = {
"apple": {
USD: 6,
EUR: 5
},
"banana": {
USD: 4,
EUR: 4
}
};
const functions = {
getFruitPrice: defineChatSessionFunction({
description: "Get the price of a fruit",
params: {
type: "object",
properties: {
name: {
type: "string"
},
currency: {
oneOf: [{
type: "null"
}, {
enum: ["USD", "EUR"]
}]
}
}
},
async handler(params) {
const name = params.name.toLowerCase();
const currency = params.currency ?? "USD";
if (Object.keys(fruitPrices).includes(name))
return {
name: name,
price: currency === "USD"
? `${fruitPrices[name]!.USD}$`
: `${fruitPrices[name]!.EUR}€`
};
return `Unrecognized fruit "${params.name}"`;
}
})
};
const q1 = "Is an apple more expensive than a banana?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
In this example, we let the model decide whether to use USD or EUR as the currency, or whether to ignore the currency altogether.
To make it clearer for the model that there's a default currency in this function, we can instead add a "default"
currency option instead of null
, and force the model to choose it if it doesn't want to choose USD or EUR.
Custom Function Calling Syntax
To provide a custom function calling syntax for the model to use, you can customize the function calling template of TemplateChatWrapper
or JinjaTemplateChatWrapper
.
Using a Custom Chat Wrapper
To provide a custom function calling syntax for a custom chat wrapper, you can set its settings with the desired function calling syntax.
Let's see an example of a custom chat wrapper that provides a custom function calling syntax:
import {fileURLToPath} from "url";
import path from "path";
import {
getLlama, LlamaChatSession, ChatWrapper,
ChatWrapperSettings, ChatWrapperGenerateContextStateOptions,
ChatWrapperGeneratedContextState, LlamaText, ChatModelFunctions,
ChatModelFunctionsDocumentationGenerator, defineChatSessionFunction
} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
class MyCustomChatWrapper extends ChatWrapper {
public readonly wrapperName: string = "MyCustomChat";
public override readonly settings: ChatWrapperSettings = {
...ChatWrapper.defaultSettings,
supportsSystemMessages: true,
functions: {
call: {
optionalPrefixSpace: true,
prefix: "[[call: ",
paramsPrefix: "(",
suffix: ")]]"
},
result: {
prefix: " [[result: ",
suffix: "]]"
}
}
};
public override generateContextState({
chatHistory, availableFunctions, documentFunctionParams
}: ChatWrapperGenerateContextStateOptions): ChatWrapperGeneratedContextState {
const historyWithFunctions = this.addAvailableFunctionsSystemMessageToHistory(chatHistory, availableFunctions, {
documentParams: documentFunctionParams
});
const texts = historyWithFunctions.map((item, index) => {
if (item.type === "system") {
if (index === 0)
return LlamaText([
LlamaText.fromJSON(item.text)
]);
return LlamaText([
"### System\n",
LlamaText.fromJSON(item.text)
]);
} else if (item.type === "user")
return LlamaText([
"### Human\n",
item.text
]);
else if (item.type === "model")
return LlamaText([
"### Assistant\n",
this.generateModelResponseText(item.response)
]);
// ensure that all chat item types are handled,
// or TypeScript will throw an error
return item satisfies never;
});
return {
contextText: LlamaText.joinValues("\n\n", texts),
// if the model generates any of these texts,
// the completion will stop, and the text will not
// be included in the response returned to the user
stopGenerationTriggers: [
LlamaText(["### Human\n"])
]
};
}
public override generateAvailableFunctionsSystemText(availableFunctions: ChatModelFunctions, {documentParams = true}: {
documentParams?: boolean
}) {
const functionsDocumentationGenerator = new ChatModelFunctionsDocumentationGenerator(availableFunctions);
if (!functionsDocumentationGenerator.hasAnyFunctions)
return LlamaText([]);
return LlamaText.joinValues("\n", [
"The assistant calls the provided functions as needed to retrieve information instead of relying on existing knowledge.",
"To fulfill a request, the assistant calls relevant functions in advance when needed before responding to the request, and does not tell the user prior to calling a function.",
"Provided functions:",
"```typescript",
functionsDocumentationGenerator.getTypeScriptFunctionSignatures({documentParams}),
"```",
"",
"Calling any of the provided functions can be done like this:",
this.generateFunctionCall("getSomeInfo", {someKey: "someValue"}),
"",
"Note that the [[call: prefix is mandatory.",
"The assistant does not inform the user about using functions and does not explain anything before calling a function.",
"After calling a function, the raw result appears afterwards and is not part of the conversation.",
"To make information be part of the conversation, the assistant paraphrases and repeats the information without the function syntax."
]);
}
}
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "my-model.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
chatWrapper: new MyCustomChatWrapper()
});
const fruitPrices: Record<string, string> = {
"apple": "$6",
"banana": "$4"
};
const functions = {
getFruitPrice: defineChatSessionFunction({
description: "Get the price of a fruit",
params: {
type: "object",
properties: {
name: {
type: "string"
}
}
},
async handler(params) {
const name = params.name.toLowerCase();
if (Object.keys(fruitPrices).includes(name))
return {
name: name,
price: fruitPrices[name]
};
return `Unrecognized fruit "${params.name}"`;
}
})
};
const q1 = "Is an apple more expensive than a banana?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
In this example, if the model would want to call the getFruitPrice
function, it would use the following syntax:
[[call: getFruitPrice({name: "apple"})]]
And the result would be:
[[result: {name: "apple", price: "$6"}]]
The generateAvailableFunctionsSystemText
function in the chat wrapper we defined here is used to inform the model about the available functions and how to call them. It'll be added to the context state as a system message, only if there are functions available.
The ChatModelFunctionsDocumentationGenerator
class is used to generate documentation for the available functions in various formats.
Parallel Function Calling Syntax
To support parallel function calling syntax, you can configure the functions.parallelism
field:
class MyCustomChatWrapper extends ChatWrapper {
public readonly wrapperName: string = "MyCustomChat";
public override readonly settings: ChatWrapperSettings = {
...ChatWrapper.defaultSettings,
supportsSystemMessages: true,
functions: {
call: {
optionalPrefixSpace: true,
prefix: "[[call: ",
paramsPrefix: "(",
suffix: ")]]"
},
result: {
prefix: "{{functionName}}({{functionParams}}) result: ",
suffix: ";"
},
parallelism: {
call: {
sectionPrefix: "",
betweenCalls: "\n",
sectionSuffix: LlamaText(new SpecialToken("EOT"))
},
result: {
sectionPrefix: "Results:\n",
betweenResults: "\n",
sectionSuffix: "\n\n"
}
}
}
};
}
In this example, if the model would want to call the getFruitPrice
function twice, it would use the following syntax:
[[call: getFruitPrice({name: "apple"})]]
[[call: getFruitPrice({name: "banana"})]]<EOT token>
And the result would be:
Results:
getFruitPrice({name: "apple"}) result: {name: "apple", price: "$6"};
getFruitPrice({name: "banana"}) result: {name: "banana", price: "$4"};