Text Completion
To generate text completions, you can use the LlamaCompletion class.
Here are usage examples of LlamaCompletion:
Text Completion
Generate a completion to a given text.
TIP
It's recommended to set maxTokens when generating a text completion to ensure the completion doesn't go on forever.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaCompletion} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const completion = new LlamaCompletion({
contextSequence: context.getSequence()
});
const input = "Here is a list of sweet fruits:\n* ";
console.log("Input: " + input);
const res = await completion.generateCompletion(input, {
maxTokens: 100
});
console.log("Completion: " + res);Fill in the Middle (Infill)
Generate a completion to a given text (prefix), that should connect to a give continuation (suffix).
You can use infillSupported to check whether a model supports infill completions. Using infill with an unsupported model will throw an UnsupportedError error.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaCompletion} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "codegemma-2b-Q4_K_M.gguf")
});
const context = await model.createContext();
const completion = new LlamaCompletion({
contextSequence: context.getSequence()
});
if (!completion.infillSupported) {
console.error("Infill is not supported for this model");
process.exit(1);
}
const prefix = "4 sweet fruits: Apple,";
const suffix = "and Grape.\n\n";
console.log("Prefix: " + prefix);
console.log("Suffix: " + suffix);
const res = await completion.generateInfillCompletion(prefix, suffix, {
maxTokens: 100
});
console.log("Fill: " + res);This example uses CodeGemma.
Stop Text Completion Generation
To stop the generation of an ongoing text completion without throwing an error (to get the partially generated text), you can use the stopOnAbortSignal option to configure what happens when the given signal is aborted.
import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaCompletion} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const completion = new LlamaCompletion({
contextSequence: context.getSequence()
});
const abortController = new AbortController();
const input = "Here is a list of sweet fruits:\n* ";
console.log("Input: " + input);
let result = "";
process.stdout.write("Streamed completion: ");
const res = await completion.generateCompletion(input, {
maxTokens: 256,
// stop the generation, instead of cancelling it
stopOnAbortSignal: true,
signal: abortController.signal,
onTextChunk(chunk) {
result += chunk;
process.stdout.write(chunk);
// max 10 lines
if (result.split("\n").length >= 10)
abortController.abort();
}
});
console.log();
console.log("Completion: " + res);