Skip to content

DeepSeek R1 with function calling

February 21, 2025

node-llama-cpp + DeepSeek R1

node-llama-cpp v3.6 is here, with full support for DeepSeek R1, including function calling!


Function Calling

node-llama-cpp includes many tricks used to make function calling work with most models. This release includes special adaptations for DeepSeek R1 to improve function calling performance and stability.

Here's a basic example of function calling with DeepSeek R1:

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
,
defineChatSessionFunction
,
resolveModelFile
} from "node-llama-cpp"; const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
modelsDir
=
path
.
join
(
__dirname
, "..", "models");
const
modelUri
= "hf:mradermacher/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M";
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
: await
resolveModelFile
(
modelUri
,
modelsDir
)
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
()
}); const
fruitPrices
:
Record
<string, string> = {
"apple": "$6", "banana": "$4" }; const
functions
= {
getFruitPrice
:
defineChatSessionFunction
({
description
: "Get the price of a fruit",
params
: {
type
: "object",
properties
: {
name
: {
type
: "string"
} } }, async
handler
(
params
) {
const
name
=
params
.
name
.
toLowerCase
();
if (
Object
.
keys
(
fruitPrices
).
includes
(
name
))
return {
name
:
name
,
price
:
fruitPrices
[
name
]
}; return `Unrecognized fruit "${
params
.
name
}"`;
} }) }; const
q1
= "Is an apple more expensive than a banana?";
console
.
log
("User: " +
q1
);
const
a1
= await
session
.
prompt
(
q1
, {
functions
});
console
.
log
("AI: " +
a1
.
trim
());

Here are some recommended model URIs you can use to try out DeepSeek R1 with function calling.

ModelSizeURI
DeepSeek R1 Distill Qwen 7B4.68GBhf:mradermacher/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M
DeepSeek R1 Distill Qwen 14B8.99GBhf:mradermacher/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
DeepSeek R1 Distill Qwen 32B19.9GBhf:mradermacher/DeepSeek-R1-Distill-Qwen-32B-GGUF:Q4_K_M

The 7B model works well with function calling in the first prompt, but tends to deteriorate in subsequent queries.
Use a larger model for better performance with multiple prompts.

TIP

Estimate the compatibility of a model with your machine before downloading it using the inspect estimate command:

shell
npx -y node-llama-cpp inspect estimate <model URI>

Try It Using the CLI

To try out function calling with a given model using the CLI, you can use the chat command with the --ef flag to provide the model with date and time functions:

shell
npx -y node-llama-cpp chat --ef --prompt "What is the time?" <model URI>

Chain of Thought Segmentation

The thoughts generated by a reasoning model are now separated into thought segments in the response, so you can choose whether to use them or not.

By default, the .prompt(...) method returns only the main response, without any thought segments. Use the .promptWithMeta(...) method to get the full response.

You can use the new onResponseChunk option to stream thought segments as they are being generated.

Electron App Template

The Electron app template has been updated to properly segment the thoughts in the response.

Try it out by downloading the latest build from GitHub, or by scaffolding a new project based on the Electron template:

shell
npm create node-llama-cpp@latest