Using Vulkan
Vulkan is a low-overhead, cross-platform 3D graphics and computing API
node-llama-cpp ships with pre-built binaries with Vulkan support for Windows and Linux, and these are automatically used when Vulkan support is detected on your machine.
Windows: Vulkan drivers are usually provided together with your GPU drivers, so most chances are that you don't have to install anything.
Linux: you have to install the Vulkan SDK.
Testing Vulkan Support
To check whether the Vulkan support works on your machine, run this command:
npx --no node-llama-cpp inspect gpuYou should see an output like this:
Vulkan: available
Vulkan device: NVIDIA RTX A6000
Vulkan used VRAM: 0% (0B/47.99GB)
Vulkan free VRAM: 100% (47.99GB/47.99GB)
CPU model: Intel(R) Xeon(R) Gold 5315Y CPU @ 3.20GHz
Used RAM: 2.51% (1.11GB/44.08GB)
Free RAM: 97.48% (42.97GB/44.08GB)If you see Vulkan used VRAM in the output, it means that Vulkan support is working on your machine.
Building node-llama-cpp With Vulkan Support
Prerequisites
CMake 3.26 or higher (optional, recommended if you have build issues)
Windows: Vulkan SDK installer
Ubuntu
shellwget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdkshellwget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list sudo apt update sudo apt install vulkan-sdkWindows only: enable long paths support
Open cmd as Administrator and run this command:
shellreg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem" /v "LongPathsEnabled" /t REG_DWORD /d "1" /fWindows only: LLVM (optional, recommended if you have build issues)
There are a few methods to install LLVM:
- As part of Microsoft Visual C++ Build Tools (Recommended): the dependencies for Window listed under Downloading a Release will also install LLVM.
- Independently: visit the latest LLVM release page and download the installer for your Windows architecture.
Building From Source
When you use the getLlama method, if there's no binary that matches the provided options, it'll automatically build llama.cpp from source.
Manually building from source using the source download command is recommended for troubleshooting build issues.
To manually build from source, run this command inside of your project:
npx --no node-llama-cpp source download --gpu vulkanIf
cmakeis not installed on your machine,node-llama-cppwill automatically downloadcmaketo an internal directory and try to use it to buildllama.cppfrom source.
If you see the message
Vulkan not foundduring the build process, it means that the Vulkan SDK is not installed on your machine or that it is not detected by the build process.
Using node-llama-cpp With Vulkan
It's recommended to use getLlama without specifying a GPU type, so it'll detect the available GPU types and use the best one automatically.
To do this, just use getLlama without any parameters:
const llama = await getLlama();
console.log("GPU type:", llama.gpu);To force it to use Vulkan, you can use the gpu option:
const llama = await getLlama({
gpu: "vulkan"
});
console.log("GPU type:", llama.gpu);By default, node-llama-cpp will offload as many layers of the model to the GPU as it can fit in the VRAM.
To force it to offload a specific number of layers, you can use the gpuLayers option:
const model = await llama.loadModel({
modelPath,
gpuLayers: 33 // or any other number of layers you want
});WARNING
Attempting to offload more layers to the GPU than the available VRAM can fit will result in an InsufficientMemoryError error.
On Linux, you can monitor GPU usage with this command:
watch -d "npx --no node-llama-cpp inspect gpu"Vulkan Caveats
At the moment, Vulkan doesn't work well when using multiple contexts at the same time, so it's recommended to use a single context with Vulkan, and to manually dispose a context (using .dispose()) before creating a new one.
CUDA is always preferred by getLlama by default when it's available, so you may not encounter this issue at all.
If you'd like to make sure Vulkan isn't used in your project, you can do this:
const llama = await getLlama({
gpu: {
type: "auto",
exclude: ["vulkan"]
}
});