CUDA Support

CUDA is a parallel computing platform and API created by NVIDIA for NVIDIA GPUs

node-llama-cpp ships with pre-built binaries with CUDA support for Windows and Linux, and these are automatically used when CUDA is detected on your machine.

To use node-llama-cpp's CUDA support with your NVIDIA GPU, make sure you have CUDA Toolkit 12.2 or higher installed on your machine.

If the pre-built binaries don't work with your CUDA installation, node-llama-cpp will automatically download a release of llama.cpp and build it from source with CUDA support. Building from source with CUDA support is slow and can take up to an hour.

The pre-built binaries are compiled with CUDA Toolkit 12.2, so any version of CUDA Toolkit that is 12.2 or higher should work with the pre-built binaries. If you have an older version of CUDA Toolkit installed on your machine, consider updating it to avoid having to wait the long build time.

Testing CUDA Support

To check whether the CUDA support works on your machine, run this command:

shell

npx --no node-llama-cpp inspect gpu

You should see an output like this:

CUDA: available

CUDA device: NVIDIA RTX A6000
CUDA used VRAM: 0.54% (266.88MB/47.65GB)
CUDA free VRAM: 99.45% (47.39GB/47.65GB)

CPU model: Intel(R) Xeon(R) Gold 5315Y CPU @ 3.20GHz
Used RAM: 2.51% (1.11GB/44.08GB)
Free RAM: 97.48% (42.97GB/44.08GB)

If you see CUDA used VRAM in the output, it means that CUDA support is working on your machine.

Prerequisites

CUDA Toolkit 12.2 or higher
cmake-js dependencies
CMake 3.26 or higher (optional, recommended if you have build issues)

Manually Building `node-llama-cpp` With CUDA Support

Run this command inside of your project:

shell

npx --no node-llama-cpp source download --gpu cuda

If cmake is not installed on your machine, node-llama-cpp will automatically download cmake to an internal directory and try to use it to build llama.cpp from source.

If you see the message CUDA not found during the build process, it means that CUDA Toolkit is not installed on your machine or that it is not detected by the build process.

Custom `llama.cpp` CMake Options

llama.cpp has some options you can use to customize your CUDA build.

llama.cpp CUDA CMake build options

Option	Description	Default value
`GGML_CUDA_FORCE_MMQ`	ggml: use mmq kernels instead of cuBLAS	`OFF`
`GGML_CUDA_FORCE_CUBLAS`	ggml: always use cuBLAS instead of mmq kernels	`OFF`
`GGML_CUDA_F16`	ggml: use 16 bit floats for some calculations	`OFF`
`GGML_CUDA_NO_PEER_COPY`	ggml: do not use peer to peer copies	`OFF`
`GGML_CUDA_NO_VMM`	ggml: do not try to use CUDA VMM	`OFF`
`GGML_CUDA_FA_ALL_QUANTS`	ggml: compile all quants for FlashAttention	`OFF`
`GGML_CUDA_GRAPHS`	ggml: use CUDA graphs (llama.cpp only)	`${GGML_CUDA_GRAPHS_DEFAULT}`

Source: CMakeLists (filtered for only CUDA-related options)
You can see all the available llama.cpp CMake build options here

To build node-llama-cpp with any of these options, set an environment variable of an option prefixed with NODE_LLAMA_CPP_CMAKE_OPTION_.

Fix the `Failed to detect a default CUDA architecture` Build Error

To fix this issue you have to set the CUDACXX environment variable to the path of the nvcc compiler.

For example, if you have installed CUDA Toolkit 12.2, you have to run a command like this:

LinuxWindows

shell

export CUDACXX=/usr/local/cuda-12.2/bin/nvcc

cmd

set CUDACXX=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe

Then run the build command again to check whether setting the CUDACXX environment variable fixed the issue.

Fix the `The CUDA compiler identification is unknown` Build Error

The solution to this error is the same as the solution to the Failed to detect a default CUDA architecture error.

Fix the `A single input file is required for a non-link phase when an outputfile is specified` Build Error

To fix this issue you have to set the CMAKE_GENERATOR_TOOLSET cmake option to the CUDA home directory, usually already set as the CUDA_PATH environment variable.

To do this, set the NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_GENERATOR_TOOLSET environment variable to the path of your CUDA home directory:

LinuxWindows

shell

export NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_GENERATOR_TOOLSET=$CUDA_PATH

cmd

set NODE_LLAMA_CPP_CMAKE_OPTION_CMAKE_GENERATOR_TOOLSET=%CUDA_PATH%

Then run the build command again to check whether setting the CMAKE_GENERATOR_TOOLSET cmake option fixed the issue.

Using `node-llama-cpp` With CUDA

It's recommended to use getLlama without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically.

To do this, just use getLlama without any parameters:

typescript

const llama
 = await getLlama
();
console
.log
("GPU type:", llama
.gpu
);

To force it to use CUDA, you can use the gpu option:

typescript

const llama
 = await getLlama
({
    gpu
: "cuda"
});
console
.log
("GPU type:", llama
.gpu
);

By default, node-llama-cpp will offload as many layers of the model to the GPU as it can fit in the VRAM.

To force it to offload a specific number of layers, you can use the gpuLayers option:

typescript

const model
 = await llama
.loadModel
({
    modelPath
,
    gpuLayers
: 33 // or any other number of layers you want
});

WARNING

Attempting to offload more layers to the GPU than the available VRAM can fit will result in an InsufficientMemoryError error.

On Linux, you can monitor GPU usage with this command:

shell

watch -d nvidia-smi

Last edited about 2 months agoView full history

CUDA Support ​

Testing CUDA Support ​

Prerequisites ​

Manually Building node-llama-cpp With CUDA Support ​

Custom llama.cpp CMake Options ​

Fix the Failed to detect a default CUDA architecture Build Error ​

Fix the The CUDA compiler identification is unknown Build Error ​

Fix the A single input file is required for a non-link phase when an outputfile is specified Build Error ​

Using node-llama-cpp With CUDA ​

CUDA Support

Testing CUDA Support

Prerequisites

Manually Building `node-llama-cpp` With CUDA Support

Custom `llama.cpp` CMake Options

Fix the `Failed to detect a default CUDA architecture` Build Error

Fix the `The CUDA compiler identification is unknown` Build Error

Fix the `A single input file is required for a non-link phase when an outputfile is specified` Build Error

Using `node-llama-cpp` With CUDA