Using Vulkan

Vulkan is a low-overhead, cross-platform 3D graphics and computing API

node-llama-cpp ships with pre-built binaries with Vulkan support for Windows and Linux, and these are automatically used when Vulkan support is detected on your machine.

Windows: Vulkan drivers are usually provided together with your GPU drivers, so most chances are that you don't have to install anything.

Linux: you have to install the Vulkan SDK.

Testing Vulkan Support

To check whether the Vulkan support works on your machine, run this command:

shell

npx --no node-llama-cpp inspect gpu

You should see an output like this:

Vulkan: available

Vulkan device: NVIDIA RTX A6000
Vulkan used VRAM: 0% (0B/47.99GB)
Vulkan free VRAM: 100% (47.99GB/47.99GB)

CPU model: Intel(R) Xeon(R) Gold 5315Y CPU @ 3.20GHz
Used RAM: 2.51% (1.11GB/44.08GB)
Free RAM: 97.48% (42.97GB/44.08GB)

If you see Vulkan used VRAM in the output, it means that Vulkan support is working on your machine.

Building `node-llama-cpp` With Vulkan Support

Prerequisites

cmake-js dependencies
CMake 3.26 or higher (optional, recommended if you have build issues)

Vulkan SDK:

Windows: Vulkan SDK installer

Ubuntu

Ubuntu 24.04Ubuntu 22.04

shell

wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list
sudo apt update
sudo apt install vulkan-sdk

shell

wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update
sudo apt install vulkan-sdk

Windows only: enable long paths support

Open cmd as Administrator and run this command:

shell

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem" /v "LongPathsEnabled" /t REG_DWORD /d "1" /f

Building From Source

When you use the getLlama method, if there's no binary that matches the provided options, it'll automatically build llama.cpp from source.

Manually building from source using the source download command is recommended for troubleshooting build issues.

To manually build from source, run this command inside of your project:

shell

npx --no node-llama-cpp source download --gpu vulkan

If cmake is not installed on your machine, node-llama-cpp will automatically download cmake to an internal directory and try to use it to build llama.cpp from source.

If you see the message Vulkan not found during the build process, it means that the Vulkan SDK is not installed on your machine or that it is not detected by the build process.

Using `node-llama-cpp` With Vulkan

It's recommended to use getLlama without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically.

To do this, just use getLlama without any parameters:

typescript

const llama
 = await getLlama
();
console
.log
("GPU type:", llama
.gpu
);

To force it to use Vulkan, you can use the gpu option:

typescript

const llama
 = await getLlama
({
    gpu
: "vulkan"
});
console
.log
("GPU type:", llama
.gpu
);

By default, node-llama-cpp will offload as many layers of the model to the GPU as it can fit in the VRAM.

To force it to offload a specific number of layers, you can use the gpuLayers option:

typescript

const model
 = await llama
.loadModel
({
    modelPath
,
    gpuLayers
: 33 // or any other number of layers you want
});

WARNING

Attempting to offload more layers to the GPU than the available VRAM can fit will result in an InsufficientMemoryError error.

On Linux, you can monitor GPU usage with this command:

shell

watch -d "npx --no node-llama-cpp inspect gpu"

Vulkan Caveats

At the moment, Vulkan doesn't work well when using multiple contexts at the same time, so it's recommended to use a single context with Vulkan, and to manually dispose a context (using .dispose()) before creating a new one.

CUDA is always preferred by getLlama by default when it's available, so you may not encounter this issue at all.

If you'd like to make sure Vulkan isn't used in your project, you can do this:

typescript

const llama
 = await getLlama
({
    gpu
: {
        type
: "auto",
        exclude
: ["vulkan"]
    }
});

Last edited 5 months agoView full history

Using Vulkan ​

Testing Vulkan Support ​

Building node-llama-cpp With Vulkan Support ​

Prerequisites ​

Windows: Vulkan SDK installer ​

Ubuntu ​

Building From Source ​

Using node-llama-cpp With Vulkan ​

Vulkan Caveats ​

Using Vulkan

Testing Vulkan Support

Building `node-llama-cpp` With Vulkan Support

Prerequisites

Windows: Vulkan SDK installer

Ubuntu

Building From Source

Using `node-llama-cpp` With Vulkan

Vulkan Caveats