Using Vulkan
Vulkan is a low-overhead, cross-platform 3D graphics and computing API
node-llama-cpp
ships with pre-built binaries with Vulkan support for Windows and Linux, and these are automatically used when Vulkan support is detected on your machine.
Windows: Vulkan drivers are usually provided together with your GPU drivers, so most chances are that you don't have to install anything.
Linux: you have to install the Vulkan SDK.
Testing Vulkan Support
To check whether the Vulkan support works on your machine, run this command:
npx --no node-llama-cpp inspect gpu
You should see an output like this:
Vulkan: available
Vulkan device: NVIDIA RTX A6000
Vulkan used VRAM: 0% (0B/47.99GB)
Vulkan free VRAM: 100% (47.99GB/47.99GB)
CPU model: Intel(R) Xeon(R) Gold 5315Y CPU @ 3.20GHz
Used RAM: 2.51% (1.11GB/44.08GB)
Free RAM: 97.48% (42.97GB/44.08GB)
If you see Vulkan used VRAM
in the output, it means that Vulkan support is working on your machine.
Building node-llama-cpp
With Vulkan Support
Prerequisites
cmake-js
dependencies- CMake 3.26 or higher (optional, recommended if you have build issues)
- Vulkan SDK:
Windows: Vulkan SDK installer
Ubuntu
shellwget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk
shellwget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list sudo apt update sudo apt install vulkan-sdk
Building From Source
When you use the getLlama
method, if there's no binary that matches the provided options, it'll automatically build llama.cpp
from source.
Manually building from source using the source download
command is recommended for troubleshooting build issues.
To manually build from source, run this command inside of your project:
npx --no node-llama-cpp source download --gpu vulkan
If
cmake
is not installed on your machine,node-llama-cpp
will automatically downloadcmake
to an internal directory and try to use it to buildllama.cpp
from source.
If you see the message
Vulkan not found
during the build process, it means that the Vulkan SDK is not installed on your machine or that it is not detected by the build process.
Using node-llama-cpp
With Vulkan
It's recommended to use getLlama
without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically.
To do this, just use getLlama
without any parameters:
const llama = await getLlama();
console.log("GPU type:", llama.gpu);
To force it to use Vulkan, you can use the gpu
option:
const llama = await getLlama({
gpu: "vulkan"
});
console.log("GPU type:", llama.gpu);
By default, node-llama-cpp
will offload as many layers of the model to the GPU as it can fit in the VRAM.
To force it to offload a specific number of layers, you can use the gpuLayers
option:
const model = await llama.loadModel({
modelPath,
gpuLayers: 33 // or any other number of layers you want
});
WARNING
Attempting to offload more layers to the GPU than the available VRAM can fit will result in an InsufficientMemoryError
error.
On Linux, you can monitor GPU usage with this command:
watch -d "npx --no node-llama-cpp inspect gpu"
Vulkan Caveats
At the moment, Vulkan doesn't work well when using multiple contexts at the same time, so it's recommended to use a single context with Vulkan, and to manually dispose a context (using .dispose()
) before creating a new one.
CUDA is always preferred by getLlama
by default when it's available, so you may not encounter this issue at all.
If you'd like to make sure Vulkan isn't used in your project, you can do this:
const llama = await getLlama({
gpu: {
type: "auto",
exclude: ["vulkan"]
}
});