Troubleshooting
ESM Usage
node-llama-cpp is an ES module, so can only use import to load it and cannot use require.
Since the Node.js ecosystem is transitioning to ESM, it's recommended to use it in your project.
To do so, make sure your package.json file has "type": "module" in it.
Using in CommonJS
If you cannot use ESM in your project, you can still use the import function from a CommonJS module to load node-llama-cpp:
async function myLogic() {
const {getLlama} = await import("node-llama-cpp");
}
myLogic();If your tsconfig.json is configured to transpile import statements into require function calls automatically, you can use this workaround to import node-llama-cpp:
async function myLogic() {
const nlc: typeof import("node-llama-cpp") = await Function('return import("node-llama-cpp")')();
const {getLlama} = nlc;
const llama = await getLlama();
}
myLogic();Investigating Unexpected llama.cpp Behavior
If you notice some unexpected behavior or crashes in your application, you should enable debug logs to see more information about what's happening.
To do so, enable the debug option when calling getLlama:
const llama = await getLlama({
debug: true
});Alternatively, you can set the environment variable NODE_LLAMA_CPP_DEBUG to true.
Running in Termux
In Termux, the prebuilt binaries cannot be used due to the custom linker used by it.
To allow node-llama-cpp to build the binaries, install the required packages first:
pkg update
pkg install nodejs git cmake clang libxml2For Vulkan support, also install the following packages:
pkg install vulkan-tools vulkan-loader-android vulkan-headers vulkan-extension-layerNote that your device GPU may not support the required capabilities that
llama.cpprequires, so it may not work.If that happens, disable Vulkan in your code or uninstall the Vulkan packages.
Crashes With an illegal hardware instruction Error or a SIGILL Signal
A common cause for this issue is when the installed nodejs architecture is different from the host machine CPU architecture.
For example, having an x64 nodejs installed on an arm64 machine (such as Apple Silicon Macs).
To check whether this is the case, run this command to see what architecture is used for the nodejs you have installed:
node -e "console.log(process.platform, process.arch)"Getting Invalid Responses Using a Qwen or Qwen2 Model
If you're getting invalid or gibberish responses when using CUDA with a Qwen or Qwen2 model, try enabling flash attention to fix the issue.
Getting an InsufficientMemoryError Error
Getting an InsufficientMemoryError error means you're trying to load a model or create a context with a specific configuration that requires more memory than the available VRAM in your GPU.
This usually happens when you specify a specific gpuLayers when loading a model, or using a specific contextSize when creating a context.
The solution to this issue is to remove these settings to let node-llama-cpp find the optimal configuration that works on your machine to load the model with and create a context with.
Give this code, you should remove the marked lines:
const llama = await getLlama();
const model = await llama.loadModel({
modelPath,
gpuLayers: "max"
});
const context = await model.createContext({
contextSize: 128000
});Getting an InsufficientMemoryError Error Although Enough VRAM is available
If you're getting an InsufficientMemoryError error even though you're certain you have enough VRAM available in your GPU, it may have to do with the way the memory usage is estimated.
node-llama-cpp has a built-in memory estimation mechanism that estimates the memory required for the model to run on the GPU in order to find the optimal configuration to load a model with and create a context with. This estimation is important also to make sure the model is loaded with parameters that won't crash the process.
However, this estimation may be inaccurate and exaggerated in some cases, or a recent change in llama.cpp may not have been accounted for in the estimation.
To check whether this is the case, you can run the inspect measure command to compare the estimated memory usage with the actual memory usage:
npx --no node-llama-cpp inspect measure [modelPath]To work around this issue, you can force node-llama-cpp to ignore the memory safeguards and load the model anyway by setting the ignoreMemorySafetyChecks options to true:
const llama = await getLlama();
const model = await llama.loadModel({
modelPath,
ignoreMemorySafetyChecks: true
});
const context = await model.createContext({
ignoreMemorySafetyChecks: true
});Important: Use
ignoreMemorySafetyCheckswith caution, as it may cause the process to crash if the memory usage exceeds the available VRAM
If you found that the memory estimation is indeed inaccurate, please open a new issue on GitHub with a link to the model you're using and the output of the inspect measure command.
Getting an The specified module could not be found \\?\C:\Users\Administrator\AppData\Roaming\npm\node_modules Error on a Windows Machine
The common cause for this issue is when using the Administrator to run npm install and then trying to run the code with a different user.
Ensure you're not using the Administrator user for npm install nor to run the code.
Getting an EPERM: operation not permitted Error on a Windows Machine When Building an Electron App
electron-builder needs to create symlinks to perform the build process, which requires enabling Developer Mode on Windows.
To do that, go to Settings > Update & Security > For developers and enable Developer mode.
After that, delete the .cache folder under your user directory and try building the app again.
