Skip to content

Getting Started

Installation

Scaffold a New Project

To create a new node-llama-cpp project with everything set up, run this command:

shell
npm create node-llama-cpp@latest

It may take a minute to download all the prebuilt binaries

You will be asked to enter a project name, select a template, and choose a model from a list of recommended models.

If this is your first time running models on your machine, we recommend starting with the Node + TypeScript template.

Existing Project

Inside of your node.js project directory, run this command:

shell
npm install node-llama-cpp

node-llama-cpp comes with pre-built binaries for macOS, Linux and Windows.

If binaries are not available for your platform, it'll fallback to download a release of llama.cpp and build it from source with cmake. To disable this behavior, set the environment variable NODE_LLAMA_CPP_SKIP_DOWNLOAD to true.

ESM Usage

node-llama-cpp is an ES module, so can only use import to load it and cannot use require.

To make sure you can use it in your project, make sure your package.json file has "type": "module" in it.

For workarounds for existing projects, see the ESM troubleshooting guide.

GPU Support

node-llama-cpp automatically detects the available compute layers on your machine and uses the best one by default, as well as balances the default settings to get the best performance from your hardware. No need to manually configure anything.

Metal: Enabled by default on Macs with Apple Silicon. If you're using a Mac with an Intel chip, you can manually enable it. Accelerate framework is always enabled.

CUDA: Used by default when support is detected. For more details, see the CUDA guide.

Vulkan: Used by default when support is detected. For more details, see the Vulkan guide.

To inspect your hardware, run this command:

shell
npx --no node-llama-cpp inspect gpu

Getting a Model File

We recommend getting a GGUF model from either Michael Radermacher on Hugging Face or by searching HuggingFace directly for a GGUF model.

We recommend starting by getting a small model that doesn't have a lot of parameters just to ensure everything works, so try downloading a 7B/8B parameters model first (search for models with both 7B/8B and GGUF in their name).

To ensure you can chat with the model, make sure you choose an Instruct model by looking for Instruct or it in the model name.

For improved download speeds, you can use the pull command to download a model:

shell
npx --no node-llama-cpp pull --dir ./models <model-file-url>

Not sure what model to get started with?

Run the chat command with no parameters to see a list of recommended models:

shell
npx --no node-llama-cpp chat

For more tips on choosing a model, see the choosing a model guide.

Validating the Model

To validate that the model you downloaded is working properly, use the chat command to chat with it:

shell
npx --no node-llama-cpp chat <path-to-a-model-file-on-your-computer>

Try telling the model Hi there and see how it reacts. If the response looks weird or doesn't make sense, try using a different model.

If the model doesn't stop generating output, try using a different chat wrapper. For example:

shell
npx --no node-llama-cpp chat --wrapper general <path-to-a-model-file-on-your-computer>

TIP

To download a model and prompt it right away with a single command, use the chat command and pass a model URL together with a --prompt flag:

shell
npx --no node-llama-cpp chat --prompt 'Hi there' <model-url>

Usage

Chatbot

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
()
}); const
q1
= "Hi there, how are you?";
console
.
log
("User: " +
q1
);
const
a1
= await
session
.
prompt
(
q1
);
console
.
log
("AI: " +
a1
);
const
q2
= "Summarize what you said";
console
.
log
("User: " +
q2
);
const
a2
= await
session
.
prompt
(
q2
);
console
.
log
("AI: " +
a2
);

To use a custom chat wrapper, see the chat wrapper guide.

Chatbot With JSON Schema

To enforce a model to generate output according to a JSON schema, use llama.createGrammarForJsonSchema().

It'll force the model to generate output according to the JSON schema you provide, and it'll do it on the text generation level.

It only supports a small subset of the JSON schema spec, but it's enough to generate useful JSON objects using a text generation model.

NOTE

To learn more about using grammars correctly, read the grammar guide.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
)
); const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
()
}); const
grammar
= await
llama
.
createGrammarForJsonSchema
({
type
: "object",
properties
: {
positiveWordsInUserMessage
: {
type
: "array",
items
: {
type
: "string"
} },
userMessagePositivityScoreFromOneToTen
: {
enum
: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
},
nameOfUser
: {
oneOf
: [{
type
: "null"
}, {
type
: "string"
}] } } }); const
prompt
= "Hi there! I'm John. Nice to meet you!";
const
res
= await
session
.
prompt
(
prompt
, {
grammar
});
const
parsedRes
=
grammar
.
parse
(
res
);
console
.
log
("User name:",
parsedRes
.
nameOfUser
);
console
.
log
(
"Positive words in user message:",
parsedRes
.
positiveWordsInUserMessage
);
console
.
log
(
"User message positivity score:",
parsedRes
.
userMessagePositivityScoreFromOneToTen
);

Chatbot With Function Calling

You can provide functions that the model can call during generation to retrieve information or perform actions.

Some models have official support for function calling in node-llama-cpp (such as Functionary and Llama 3 Instruct), while other models fallback to a generic function calling mechanism that works with many models, but not all of them.

NOTE

To learn more about using function calling correctly, read the function calling guide.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaChatSession
,
defineChatSessionFunction
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
session
= new
LlamaChatSession
({
contextSequence
:
context
.
getSequence
()
}); const
fruitPrices
:
Record
<string, string> = {
"apple": "$6", "banana": "$4" }; const
functions
= {
getFruitPrice
:
defineChatSessionFunction
({
description
: "Get the price of a fruit",
params
: {
type
: "object",
properties
: {
name
: {
type
: "string"
} } }, async
handler
(
params
) {
const
name
=
params
.
name
.
toLowerCase
();
if (
Object
.
keys
(
fruitPrices
).
includes
(
name
))
return {
name
:
name
,
price
:
fruitPrices
[
name
]
}; return `Unrecognized fruit "${
params
.
name
}"`;
} }) }; const
q1
= "Is an apple more expensive than a banana?";
console
.
log
("User: " +
q1
);
const
a1
= await
session
.
prompt
(
q1
, {
functions
});
console
.
log
("AI: " +
a1
);

Raw

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
Token
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
sequence
=
context
.
getSequence
();
const
q1
= "Hi there, how are you?";
console
.
log
("User: " +
q1
);
const
tokens
=
model
.
tokenize
("USER: " +
q1
+ "\nASSISTANT: ");
const
res
:
Token
[] = [];
for await (const
generatedToken
of
sequence
.
evaluate
(
tokens
)) {
res
.
push
(
generatedToken
);
// It's important to not concatenate the results as strings, // as doing so breaks some characters (like some emojis) // that consist of multiple tokens. // By using an array of tokens, we can decode them correctly together. const
resString
=
model
.
detokenize
(
res
);
const
lastPart
=
resString
.
split
("ASSISTANT:").
pop
();
if (
lastPart
?.
includes
("USER:"))
break; } const
a1
=
model
.
detokenize
(
res
).
split
("USER:")[0]!;
console
.
log
("AI: " +
a1
.
trim
());

Next Steps

Now that you've learned the basics of node-llama-cpp, you can explore more advanced topics by reading the guides in the Guide section of the sidebar.

Use GitHub Discussions to ask questions if you get stuck,
and give node-llama-cpp a star on GitHub if you found it useful.

Explore the API reference to learn more about the available functions and classes, and use the search bar (press /) to find documentation for a specific topic or API.

Check out the roadmap to see what's coming next,
and consider sponsoring node-llama-cpp to accelerate the development of new features.