Skip to content

Using Vulkan

Vulkan is a low-overhead, cross-platform 3D graphics and computing API

node-llama-cpp ships with pre-built binaries with Vulkan support for Windows and Linux, and these are automatically used when Vulkan support is detected on your machine.

Windows: Vulkan drivers are usually provided together with your GPU drivers, so most chances are that you don't have to install anything.

Linux: you have to install the Vulkan SDK.

Testing Vulkan Support

To check whether the Vulkan support works on your machine, run this command:

shell
npx --no node-llama-cpp inspect gpu

You should see an output like this:

Vulkan: available

Vulkan device: NVIDIA RTX A6000
Vulkan used VRAM: 0% (0B/47.99GB)
Vulkan free VRAM: 100% (47.99GB/47.99GB)

CPU model: Intel(R) Xeon(R) Gold 5315Y CPU @ 3.20GHz
Used RAM: 2.51% (1.11GB/44.08GB)
Free RAM: 97.48% (42.97GB/44.08GB)

If you see Vulkan used VRAM in the output, it means that Vulkan support is working on your machine.

Building node-llama-cpp With Vulkan Support

Prerequisites

  • cmake-js dependencies
  • CMake 3.26 or higher (optional, recommended if you have build issues)
  • Vulkan SDK:

    Windows: Vulkan SDK installer

    Ubuntu

    shell
    wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
    sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list
    sudo apt update
    sudo apt install vulkan-sdk
    shell
    wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
    sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
    sudo apt update
    sudo apt install vulkan-sdk

Building From Source

When you use the getLlama method, if there's no binary that matches the provided options, it'll automatically build llama.cpp from source.

Manually building from source using the source download command is recommended for troubleshooting build issues.

To manually build from source, run this command inside of your project:

shell
npx --no node-llama-cpp source download --gpu vulkan

If cmake is not installed on your machine, node-llama-cpp will automatically download cmake to an internal directory and try to use it to build llama.cpp from source.

If you see the message Vulkan not found during the build process, it means that the Vulkan SDK is not installed on your machine or that it is not detected by the build process.

Using node-llama-cpp With Vulkan

It's recommended to use getLlama without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically.

To do this, just use getLlama without any parameters:

typescript
const 
llama
= await
getLlama
();
console
.
log
("GPU type:",
llama
.
gpu
);

To force it to use Vulkan, you can use the gpu option:

typescript
const 
llama
= await
getLlama
({
gpu
: "vulkan"
});
console
.
log
("GPU type:",
llama
.
gpu
);

By default, node-llama-cpp will offload as many layers of the model to the GPU as it can fit in the VRAM.

To force it to offload a specific number of layers, you can use the gpuLayers option:

typescript
const 
model
= await
llama
.
loadModel
({
modelPath
,
gpuLayers
: 33 // or any other number of layers you want
});

WARNING

Attempting to offload more layers to the GPU than the available VRAM can fit will result in an InsufficientMemoryError error.

On Linux, you can monitor GPU usage with this command:

shell
watch -d "npx --no node-llama-cpp inspect gpu"

Vulkan Caveats

At the moment, Vulkan doesn't work well when using multiple contexts at the same time, so it's recommended to use a single context with Vulkan, and to manually dispose a context (using .dispose()) before creating a new one.

CUDA is always preferred by getLlama by default when it's available, so you may not encounter this issue at all.

If you'd like to make sure Vulkan isn't used in your project, you can do this:

typescript
const 
llama
= await
getLlama
({
gpu
: {
type
: "auto",
exclude
: ["vulkan"]
} });