DeepSeek R1 with function calling
February 21, 2025

node-llama-cpp
v3.6 is here, with full support for DeepSeek R1, including function calling!
Function Calling
node-llama-cpp
includes many tricks used to make function calling work with most models. This release includes special adaptations for DeepSeek R1 to improve function calling performance and stability.
Here's a basic example of function calling with DeepSeek R1:
import {fileURLToPath} from "url";
import path from "path";
import {
getLlama, LlamaChatSession, defineChatSessionFunction, resolveModelFile
} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const modelsDir = path.join(__dirname, "..", "models");
const modelUri = "hf:mradermacher/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M";
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: await resolveModelFile(modelUri, modelsDir)
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const fruitPrices: Record<string, string> = {
"apple": "$6",
"banana": "$4"
};
const functions = {
getFruitPrice: defineChatSessionFunction({
description: "Get the price of a fruit",
params: {
type: "object",
properties: {
name: {
type: "string"
}
}
},
async handler(params) {
const name = params.name.toLowerCase();
if (Object.keys(fruitPrices).includes(name))
return {
name: name,
price: fruitPrices[name]
};
return `Unrecognized fruit "${params.name}"`;
}
})
};
const q1 = "Is an apple more expensive than a banana?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1.trim());
Recommended Models
Here are some recommended model URIs you can use to try out DeepSeek R1 with function calling.
Model | Size | URI |
---|---|---|
DeepSeek R1 Distill Qwen 7B | 4.68GB | hf:mradermacher/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M |
DeepSeek R1 Distill Qwen 14B | 8.99GB | hf:mradermacher/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M |
DeepSeek R1 Distill Qwen 32B | 19.9GB | hf:mradermacher/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M |
The 7B model works well with function calling in the first prompt, but tends to deteriorate in subsequent queries.
Use a larger model for better performance with multiple prompts.
TIP
Estimate the compatibility of a model with your machine before downloading it using the inspect estimate
command:
npx -y node-llama-cpp inspect estimate <model URI>
Try It Using the CLI
To try out function calling with a given model using the CLI, you can use the chat
command with the --ef
flag to provide the model with date and time functions:
npx -y node-llama-cpp chat --ef --prompt "What is the time?" <model URI>
Chain of Thought Segmentation
The thoughts generated by a reasoning model are now separated into thought
segments in the response, so you can choose whether to use them or not.
By default, the .prompt(...)
method returns only the main response, without any thought
segments. Use the .promptWithMeta(...)
method to get the full response.
You can use the new onResponseChunk
option to stream thought
segments as they are being generated.
Electron App Template
The Electron app template has been updated to properly segment the thoughts in the response.
Try it out by downloading the latest build from GitHub, or by scaffolding a new project based on the Electron template:
npm create node-llama-cpp@latest