Troubleshooting
ESM Usage
node-llama-cpp
is an ES module, so can only use import
to load it and cannot use require
.
Since the Node.js ecosystem is transitioning to ESM, it's recommended to use it in your project.
To do so, make sure your package.json
file has "type": "module"
in it.
Using in CommonJS
If you cannot use ESM in your project, you can still use the import
function from a CommonJS module to load node-llama-cpp
:
async function myLogic() {
const {getLlama} = await import("node-llama-cpp");
}
myLogic();
If your tsconfig.json
is configured to transpile import
statements into require
function calls automatically, you can use this workaround to import
node-llama-cpp
:
async function myLogic() {
const nlc: typeof import("node-llama-cpp") = await Function('return import("node-llama-cpp")')();
const {getLlama} = nlc;
const llama = await getLlama();
}
myLogic();
Investigating Unexpected llama.cpp
Behavior
If you notice some unexpected behavior or crashes in your application, you should enable debug logs to see more information about what's happening.
To do so, enable the debug
option when calling getLlama
:
const llama = await getLlama({
debug: true
});
Alternatively, you can set the environment variable NODE_LLAMA_CPP_DEBUG
to true
.
Running in Termux
In Termux, the prebuilt binaries cannot be used due to the custom linker used by it.
To allow node-llama-cpp
to build the binaries, install the required packages first:
pkg update
pkg install nodejs git cmake clang libxml2
For Vulkan support, also install the following packages:
pkg install vulkan-tools vulkan-loader-android vulkan-headers vulkan-extension-layer
Note that your device GPU may not support the required capabilities that
llama.cpp
requires, so it may not work.If that happens, disable Vulkan in your code or uninstall the Vulkan packages.
Crashes With an illegal hardware instruction
Error or a SIGILL
Signal
A common cause for this issue is when the installed nodejs architecture is different from the host machine CPU architecture.
For example, having an x64 nodejs installed on an arm64 machine (such as Apple Silicon Macs).
To check whether this is the case, run this command to see what architecture is used for the nodejs you have installed:
node -e "console.log(process.platform, process.arch)"
Getting Invalid Responses Using a Qwen or Qwen2 Model
If you're getting invalid or gibberish responses when using CUDA with a Qwen or Qwen2 model, try enabling flash attention to fix the issue.
Getting an InsufficientMemoryError
Error
Getting an InsufficientMemoryError
error means you're trying to load a model or create a context with a specific configuration that requires more memory than the available VRAM in your GPU.
This usually happens when you specify a specific gpuLayers
when loading a model, or using a specific contextSize
when creating a context.
The solution to this issue is to remove these settings to let node-llama-cpp
find the optimal configuration that works on your machine to load the model with and create a context with.
Give this code, you should remove the marked lines:
const llama = await getLlama();
const model = await llama.loadModel({
modelPath,
gpuLayers: "max"
});
const context = await model.createContext({
contextSize: 128000
});
Getting an InsufficientMemoryError
Error Although Enough VRAM is available
If you're getting an InsufficientMemoryError
error even though you're certain you have enough VRAM available in your GPU, it may have to do with the way the memory usage is estimated.
node-llama-cpp
has a built-in memory estimation mechanism that estimates the memory required for the model to run on the GPU in order to find the optimal configuration to load a model with and create a context with. This estimation is important also to make sure the model is loaded with parameters that won't crash the process.
However, this estimation may be inaccurate and exaggerated in some cases, or a recent change in llama.cpp
may not have been accounted for in the estimation.
To check whether this is the case, you can run the inspect measure
command to compare the estimated memory usage with the actual memory usage:
npx --no node-llama-cpp inspect measure [modelPath]
To work around this issue, you can force node-llama-cpp
to ignore the memory safeguards and load the model anyway by setting the ignoreMemorySafetyChecks
options to true
:
const llama = await getLlama();
const model = await llama.loadModel({
modelPath,
ignoreMemorySafetyChecks: true
});
const context = await model.createContext({
ignoreMemorySafetyChecks: true
});
Important: Use
ignoreMemorySafetyChecks
with caution, as it may cause the process to crash if the memory usage exceeds the available VRAM
If you found that the memory estimation is indeed inaccurate, please open a new issue on GitHub with a link to the model you're using and the output of the inspect measure
command.
Getting an The specified module could not be found \\?\C:\Users\Administrator\AppData\Roaming\npm\node_modules
Error on a Windows Machine
The common cause for this issue is when using the Administrator
to run npm install
and then trying to run the code with a different user.
Ensure you're not using the Administrator
user for npm install
nor to run the code.
Getting an EPERM: operation not permitted
Error on a Windows Machine When Building an Electron App
electron-builder
needs to create symlinks to perform the build process, which requires enabling Developer Mode on Windows.
To do that, go to Settings > Update & Security > For developers
and enable Developer mode
.
After that, delete the .cache
folder under your user directory and try building the app again.