Troubleshooting

ESM Usage

node-llama-cpp is an ES module, so can only use import to load it and cannot use require.

Since the Node.js ecosystem is transitioning to ESM, it's recommended to use it in your project.

To do so, make sure your package.json file has "type": "module" in it.

Using in CommonJS

If you cannot use ESM in your project, you can still use the import function from a CommonJS module to load node-llama-cpp:

typescript

async function myLogic
() {
    const {getLlama
} = await import("node-llama-cpp");
}

myLogic
();

If your tsconfig.json is configured to transpile import statements into require function calls automatically, you can use this workaround to import node-llama-cpp:

typescript

async function myLogic
() {
    const nlc
: typeof import("node-llama-cpp") = await Function
('return import("node-llama-cpp")')();
    const {getLlama
} = nlc
;
    
    const llama
 = await getLlama
();
}

myLogic
();

Investigating Unexpected `llama.cpp` Behavior

If you notice some unexpected behavior or crashes in your application, you should enable debug logs to see more information about what's happening.

To do so, enable the debug option when calling getLlama:

typescript

const llama
 = await getLlama
({
    debug
: true
});

Alternatively, you can set the environment variable NODE_LLAMA_CPP_DEBUG to true.

Running in Termux

In Termux, the prebuilt binaries cannot be used due to the custom linker used by it.

To allow node-llama-cpp to build the binaries, install the required packages first:

bash

pkg update
pkg install nodejs git cmake clang libxml2

For Vulkan support, also install the following packages:

bash

pkg install vulkan-tools vulkan-loader-android vulkan-headers vulkan-extension-layer

Note that your device GPU may not support the required capabilities that llama.cpp requires, so it may not work.
If that happens, disable Vulkan in your code or uninstall the Vulkan packages.

Crashes With an `illegal hardware instruction` Error or a `SIGILL` Signal

A common cause for this issue is when the installed nodejs architecture is different from the host machine CPU architecture.

For example, having an x64 nodejs installed on an arm64 machine (such as Apple Silicon Macs).

To check whether this is the case, run this command to see what architecture is used for the nodejs you have installed:

shell

node -e "console.log(process.platform, process.arch)"

Getting Invalid Responses Using a Qwen or Qwen2 Model

If you're getting invalid or gibberish responses when using CUDA with a Qwen or Qwen2 model, try enabling flash attention to fix the issue.

Getting an `InsufficientMemoryError` Error

Getting an InsufficientMemoryError error means you're trying to load a model or create a context with a specific configuration that requires more memory than the available VRAM in your GPU.

This usually happens when you specify a specific gpuLayers when loading a model, or using a specific contextSize when creating a context.

The solution to this issue is to remove these settings to let node-llama-cpp find the optimal configuration that works on your machine to load the model with and create a context with.

Give this code, you should remove the marked lines:

typescript

const llama
 = await getLlama
();
const model
 = await llama
.loadModel
({
    modelPath
,
    gpuLayers
: "max"
});
const context
 = await model
.createContext
({
    contextSize
: 128000
});

Getting an `InsufficientMemoryError` Error Although Enough VRAM is available

If you're getting an InsufficientMemoryError error even though you're certain you have enough VRAM available in your GPU, it may have to do with the way the memory usage is estimated.

node-llama-cpp has a built-in memory estimation mechanism that estimates the memory required for the model to run on the GPU in order to find the optimal configuration to load a model with and create a context with. This estimation is important also to make sure the model is loaded with parameters that won't crash the process.

However, this estimation may be inaccurate and exaggerated in some cases, or a recent change in llama.cpp may not have been accounted for in the estimation.

To check whether this is the case, you can run the inspect measure command to compare the estimated memory usage with the actual memory usage:

shell

npx --no node-llama-cpp inspect measure [modelPath]

To work around this issue, you can force node-llama-cpp to ignore the memory safeguards and load the model anyway by setting the ignoreMemorySafetyChecks options to true:

typescript

const llama
 = await getLlama
();
const model
 = await llama
.loadModel
({
    modelPath
,
    ignoreMemorySafetyChecks
: true
});
const context
 = await model
.createContext
({
    ignoreMemorySafetyChecks
: true
});

Important: Use ignoreMemorySafetyChecks with caution, as it may cause the process to crash if the memory usage exceeds the available VRAM

If you found that the memory estimation is indeed inaccurate, please open a new issue on GitHub with a link to the model you're using and the output of the inspect measure command.

Getting an `The specified module could not be found \\?\C:\Users\Administrator\AppData\Roaming\npm\node_modules` Error on a Windows Machine

The common cause for this issue is when using the Administrator to run npm install and then trying to run the code with a different user.

Ensure you're not using the Administrator user for npm install nor to run the code.

Getting an `EPERM: operation not permitted` Error on a Windows Machine When Building an Electron App

electron-builder needs to create symlinks to perform the build process, which requires enabling Developer Mode on Windows.

To do that, go to Settings > Update & Security > For developers and enable Developer mode.

After that, delete the .cache folder under your user directory and try building the app again.

Last edited 10 months agoView full history

Troubleshooting ​

ESM Usage ​

Using in CommonJS ​

Investigating Unexpected llama.cpp Behavior ​

Running in Termux ​

Crashes With an illegal hardware instruction Error or a SIGILL Signal ​

Getting Invalid Responses Using a Qwen or Qwen2 Model ​

Getting an InsufficientMemoryError Error ​

Getting an InsufficientMemoryError Error Although Enough VRAM is available ​

Getting an The specified module could not be found \\?\C:\Users\Administrator\AppData\Roaming\npm\node_modules Error on a Windows Machine ​

Getting an EPERM: operation not permitted Error on a Windows Machine When Building an Electron App ​

Troubleshooting

ESM Usage

Using in CommonJS

Investigating Unexpected `llama.cpp` Behavior

Running in Termux

Crashes With an `illegal hardware instruction` Error or a `SIGILL` Signal

Getting Invalid Responses Using a Qwen or Qwen2 Model

Getting an `InsufficientMemoryError` Error

Getting an `InsufficientMemoryError` Error Although Enough VRAM is available

Getting an `The specified module could not be found \\?\C:\Users\Administrator\AppData\Roaming\npm\node_modules` Error on a Windows Machine

Getting an `EPERM: operation not permitted` Error on a Windows Machine When Building an Electron App