Skip to content

Text Completion

To generate text completions, you can use the LlamaCompletion class.

Here are usage examples of LlamaCompletion:

Text Completion

Generate a completion to a given text.

TIP

It's recommended to set maxTokens when generating a text completion to ensure the completion doesn't go on forever.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaCompletion
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
completion
= new
LlamaCompletion
({
contextSequence
:
context
.
getSequence
()
}); const
input
= "Here is a list of sweet fruits:\n* ";
console
.
log
("Input: " +
input
);
const
res
= await
completion
.
generateCompletion
(
input
, {
maxTokens
: 100
});
console
.
log
("Completion: " +
res
);

Fill in the Middle (Infill)

Generate a completion to a given text (prefix), that should connect to a give continuation (suffix).

You can use infillSupported to check whether a model supports infill completions. Using infill with an unsupported model will throw an UnsupportedError error.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaCompletion
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "codegemma-2b-Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
completion
= new
LlamaCompletion
({
contextSequence
:
context
.
getSequence
()
}); if (!
completion
.
infillSupported
) {
console
.
error
("Infill is not supported for this model");
process
.
exit
(1);
} const
prefix
= "4 sweet fruits: Apple,";
const
suffix
= "and Grape.\n\n";
console
.
log
("Prefix: " +
prefix
);
console
.
log
("Suffix: " +
suffix
);
const
res
= await
completion
.
generateInfillCompletion
(
prefix
,
suffix
, {
maxTokens
: 100
});
console
.
log
("Fill: " +
res
);

This example uses CodeGemma.

Stop Text Completion Generation

To stop the generation of an ongoing text completion without throwing an error (to get the partially generated text), you can use the stopOnAbortSignal option to configure what happens when the given signal is aborted.

typescript
import {
fileURLToPath
} from "url";
import
path
from "path";
import {
getLlama
,
LlamaCompletion
} from "node-llama-cpp";
const
__dirname
=
path
.
dirname
(
fileURLToPath
(import.meta.
url
));
const
llama
= await
getLlama
();
const
model
= await
llama
.
loadModel
({
modelPath
:
path
.
join
(
__dirname
, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
}); const
context
= await
model
.
createContext
();
const
completion
= new
LlamaCompletion
({
contextSequence
:
context
.
getSequence
()
}); const
abortController
= new
AbortController
();
const
input
= "Here is a list of sweet fruits:\n* ";
console
.
log
("Input: " +
input
);
let
result
= "";
process
.
stdout
.
write
("Streamed completion: ");
const
res
= await
completion
.
generateCompletion
(
input
, {
maxTokens
: 256,
// stop the generation, instead of cancelling it
stopOnAbortSignal
: true,
signal
:
abortController
.
signal
,
onTextChunk
(
chunk
) {
result
+=
chunk
;
process
.
stdout
.
write
(
chunk
);
// max 10 lines if (
result
.
split
("\n").
length
>= 10)
abortController
.
abort
();
} });
console
.
log
();
console
.
log
("Completion: " +
res
);