Getting Started
Installation
Scaffold a New Project
To create a new node-llama-cpp
project with everything set up, run this command:
npm create node-llama-cpp@latest
It may take a minute to download all the prebuilt binaries
You will be asked to enter a project name, select a template, and choose a model from a list of recommended models.
If this is your first time running models on your machine, we recommend starting with the Node + TypeScript
template.
Existing Project
Inside of your node.js project directory, run this command:
npm install node-llama-cpp
node-llama-cpp
comes with pre-built binaries for macOS, Linux and Windows.If binaries are not available for your platform, it'll fallback to download a release of
llama.cpp
and build it from source withcmake
. To disable this behavior, set the environment variableNODE_LLAMA_CPP_SKIP_DOWNLOAD
totrue
.
ESM Usage
node-llama-cpp
is an ES module, so can only use import
to load it and cannot use require
.
To make sure you can use it in your project, make sure your package.json
file has "type": "module"
in it.
For workarounds for existing projects, see the ESM troubleshooting guide.
GPU Support
node-llama-cpp
automatically detects the available compute layers on your machine and uses the best one by default, as well as balances the default settings to get the best performance from your hardware. No need to manually configure anything.
Metal: Enabled by default on Macs with Apple Silicon. If you're using a Mac with an Intel chip, you can manually enable it. Accelerate framework is always enabled.
CUDA: Used by default when support is detected. For more details, see the CUDA guide.
Vulkan: Used by default when support is detected. For more details, see the Vulkan guide.
To inspect your hardware, run this command:
npx --no node-llama-cpp inspect gpu
Getting a Model File
We recommend getting a GGUF model from either Michael Radermacher on Hugging Face or by searching HuggingFace directly for a GGUF model.
We recommend starting by getting a small model that doesn't have a lot of parameters just to ensure everything works, so try downloading a 7B
/8B
parameters model first (search for models with both 7B
/8B
and GGUF
in their name).
To ensure you can chat with the model, make sure you choose an Instruct model by looking for Instruct
or it
in the model name.
For improved download speeds, you can use the pull
command to download a model:
npx --no node-llama-cpp pull --dir ./models <model-file-url>
Not sure what model to get started with?
Run the chat
command with no parameters to see a list of recommended models:
npx --no node-llama-cpp chat
For more tips on choosing a model, see the choosing a model guide.
Validating the Model
To validate that the model you downloaded is working properly, use the chat
command to chat with it:
npx --no node-llama-cpp chat <path-to-a-model-file-on-your-computer>
Try telling the model Hi there
and see how it reacts. If the response looks weird or doesn't make sense, try using a different model.
If the model doesn't stop generating output, try using a different chat wrapper. For example:
npx --no node-llama-cpp chat --wrapper general <path-to-a-model-file-on-your-computer>
TIP
To download a model and prompt it right away with a single command, use the chat
command and pass a model URL together with a --prompt
flag:
npx --no node-llama-cpp chat --prompt 'Hi there' <model-url>
Usage
Chatbot
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
const q2 = "Summarize what you said";
console.log("User: " + q2);
const a2 = await session.prompt(q2);
console.log("AI: " + a2);
To use a custom chat wrapper, see the chat wrapper guide.
Chatbot With JSON Schema
To enforce a model to generate output according to a JSON schema, use llama.createGrammarForJsonSchema()
.
It'll force the model to generate output according to the JSON schema you provide, and it'll do it on the text generation level.
It only supports a small subset of the JSON schema spec, but it's enough to generate useful JSON objects using a text generation model.
NOTE
To learn more about using grammars correctly, read the grammar guide.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";
const __dirname = path.dirname(
fileURLToPath(import.meta.url)
);
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const grammar = await llama.createGrammarForJsonSchema({
type: "object",
properties: {
positiveWordsInUserMessage: {
type: "array",
items: {
type: "string"
}
},
userMessagePositivityScoreFromOneToTen: {
enum: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
},
nameOfUser: {
oneOf: [{
type: "null"
}, {
type: "string"
}]
}
}
});
const prompt = "Hi there! I'm John. Nice to meet you!";
const res = await session.prompt(prompt, {grammar});
const parsedRes = grammar.parse(res);
console.log("User name:", parsedRes.nameOfUser);
console.log(
"Positive words in user message:",
parsedRes.positiveWordsInUserMessage
);
console.log(
"User message positivity score:",
parsedRes.userMessagePositivityScoreFromOneToTen
);
Chatbot With Function Calling
You can provide functions that the model can call during generation to retrieve information or perform actions.
Some models have official support for function calling in node-llama-cpp
(such as Functionary and Llama 3 Instruct), while other models fallback to a generic function calling mechanism that works with many models, but not all of them.
NOTE
To learn more about using function calling correctly, read the function calling guide.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession, defineChatSessionFunction} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const fruitPrices: Record<string, string> = {
"apple": "$6",
"banana": "$4"
};
const functions = {
getFruitPrice: defineChatSessionFunction({
description: "Get the price of a fruit",
params: {
type: "object",
properties: {
name: {
type: "string"
}
}
},
async handler(params) {
const name = params.name.toLowerCase();
if (Object.keys(fruitPrices).includes(name))
return {
name: name,
price: fruitPrices[name]
};
return `Unrecognized fruit "${params.name}"`;
}
})
};
const q1 = "Is an apple more expensive than a banana?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
Raw
NOTE
To learn more about using low level APIs, read the low level API guide.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, Token} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const sequence = context.getSequence();
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const tokens = model.tokenize("USER: " + q1 + "\nASSISTANT: ");
const res: Token[] = [];
for await (const generatedToken of sequence.evaluate(tokens)) {
res.push(generatedToken);
// It's important to not concatenate the results as strings,
// as doing so breaks some characters (like some emojis)
// that consist of multiple tokens.
// By using an array of tokens, we can decode them correctly together.
const resString = model.detokenize(res);
const lastPart = resString.split("ASSISTANT:").pop();
if (lastPart?.includes("USER:"))
break;
}
const a1 = model.detokenize(res).split("USER:")[0]!;
console.log("AI: " + a1.trim());
Next Steps
Now that you've learned the basics of node-llama-cpp
, you can explore more advanced topics by reading the guides in the Guide section of the sidebar.
Use GitHub Discussions to ask questions if you get stuck,
and give node-llama-cpp
a star on GitHub if you found it useful.
Explore the API reference to learn more about the available functions and classes, and use the search bar (press /) to find documentation for a specific topic or API.
Check out the roadmap to see what's coming next,
and consider sponsoring node-llama-cpp
to accelerate the development of new features.