Getting started

Installation

Inside of your node.js project directory, run this command:

bash

npm install --save node-llama-cpp

npm install --save node-llama-cpp

node-llama-cpp comes with pre-built binaries for macOS, Linux and Windows.
If binaries are not available for your platform, it'll fallback to download a release of llama.cpp and build it from source with cmake. To disable this behavior, set the environment variable NODE_LLAMA_CPP_SKIP_DOWNLOAD to true.

ESM usage

node-llama-cpp is an ES module, so can only use import to load it and cannot use require.

To make sure you can use it in your project, make sure your package.json file has "type": "module" in it.

CUDA and Metal support

Metal: Metal support is enabled by default on macOS. If you're using a Mac with an Intel chip, you might want to disable it.

CUDA: To enable CUDA support, see the CUDA guide.

Getting a model file

We recommend you to get a GGUF model from the TheBloke on Hugging Face.

We recommend you to start by getting a small model that doesn't have a lot of parameters just to ensure everything works, so try downloading a 7B parameters model first (search for models with both 7B and GGUF in their name).

For improved download speeds, you can use ipull to download the model:

bash

npx ipull <model-file-ul>

npx ipull <model-file-ul>

Validating the model

To validate that the model you downloaded is working properly, run the following command to chat with it:

bash

npx --no node-llama-cpp chat --model <path-to-a-model-file-on-your-computer>

npx --no node-llama-cpp chat --model <path-to-a-model-file-on-your-computer>

Try telling the model Hi there and see how it reacts. If the response looks weird or doesn't make sense, try using a different model.

If the model doesn't stop generating output, try using a different chat wrapper. For example:

bash

npx --no node-llama-cpp chat --wrapper llamaChat --model <path-to-a-model-file-on-your-computer>

npx --no node-llama-cpp chat --wrapper llamaChat --model <path-to-a-model-file-on-your-computer>

Usage

Chatbot

typescript

import {fileURLToPath} from "url";
import path from "path";
import {LlamaModel, LlamaContext, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);


const q2 = "Summerize what you said";
console.log("User: " + q2);

const a2 = await session.prompt(q2);
console.log("AI: " + a2);

import {fileURLToPath} from "url";
import path from "path";
import {LlamaModel, LlamaContext, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);


const q2 = "Summerize what you said";
console.log("User: " + q2);

const a2 = await session.prompt(q2);
console.log("AI: " + a2);

To use a custom chat prompt wrapper, see the chat prompt wrapper guide.

Chatbot with JSON schema

To force the model to generate output according to a JSON schema, use the LlamaJsonSchemaGrammar class.

It'll force the model to generate output according to the JSON schema you provide, and it'll do it on the text generation level.

It only supports a small subset of the JSON schema spec, but it's enough to generate useful JSON objects using a text generation model.

NOTE

To learn more on how to use grammars correctly, read the grammar guide.

typescript

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaJsonSchemaGrammar, LlamaContext, LlamaChatSession
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
})
const grammar = new LlamaJsonSchemaGrammar({
    "type": "object",
    "properties": {
        "responseMessage": {
            "type": "string"
        },
        "requestPositivityScoreFromOneToTen": {
            "type": "number"
        }
    }
} as const);
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});


const q1 = 'How are you doing?';
console.log("User: " + q1);

const a1 = await session.prompt(q1, {
    grammar,
    maxTokens: context.getContextSize()
});
console.log("AI: " + a1);

const parsedA1 = grammar.parse(a1);
console.log(
    parsedA1.responseMessage,
    parsedA1.requestPositivityScoreFromOneToTen
);

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaJsonSchemaGrammar, LlamaContext, LlamaChatSession
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
})
const grammar = new LlamaJsonSchemaGrammar({
    "type": "object",
    "properties": {
        "responseMessage": {
            "type": "string"
        },
        "requestPositivityScoreFromOneToTen": {
            "type": "number"
        }
    }
} as const);
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});


const q1 = 'How are you doing?';
console.log("User: " + q1);

const a1 = await session.prompt(q1, {
    grammar,
    maxTokens: context.getContextSize()
});
console.log("AI: " + a1);

const parsedA1 = grammar.parse(a1);
console.log(
    parsedA1.responseMessage,
    parsedA1.requestPositivityScoreFromOneToTen
);

Raw

typescript

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaContext, LlamaChatSession, Token
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});

const context = new LlamaContext({model});

const q1 = "Hi there, how are you?";
console.log("AI: " + q1);

const tokens = context.encode(q1);
const res: Token[] = [];
for await (const modelToken of context.evaluate(tokens)) {
    res.push(modelToken);
    
    // It's important to not concatinate the results as strings,
    // as doing so will break some characters (like some emojis)
    // that consist of multiple tokens.
    // By using an array of tokens, we can decode them correctly together.
    const resString: string = context.decode(res);
    
    const lastPart = resString.split("ASSISTANT:").reverse()[0];
    if (lastPart.includes("USER:"))
        break;
}

const a1 = context.decode(res).split("USER:")[0];
console.log("AI: " + a1);

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaContext, LlamaChatSession, Token
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});

const context = new LlamaContext({model});

const q1 = "Hi there, how are you?";
console.log("AI: " + q1);

const tokens = context.encode(q1);
const res: Token[] = [];
for await (const modelToken of context.evaluate(tokens)) {
    res.push(modelToken);
    
    // It's important to not concatinate the results as strings,
    // as doing so will break some characters (like some emojis)
    // that consist of multiple tokens.
    // By using an array of tokens, we can decode them correctly together.
    const resString: string = context.decode(res);
    
    const lastPart = resString.split("ASSISTANT:").reverse()[0];
    if (lastPart.includes("USER:"))
        break;
}

const a1 = context.decode(res).split("USER:")[0];
console.log("AI: " + a1);

Getting started ​

Installation ​

ESM usage ​

CUDA and Metal support ​

Getting a model file ​

Validating the model ​

Usage ​

Chatbot ​

Chatbot with JSON schema ​

Raw ​

Getting started

Installation

ESM usage

CUDA and Metal support

Getting a model file

Validating the model

Usage

Chatbot

Chatbot with JSON schema

Raw