Building From Source
node-llama-cpp
ships with pre-built binaries for macOS, Linux and Windows.
In case binaries are not available for your platform or fail to load, it'll fallback to download a release of llama.cpp
and build it from source with cmake
.
Downloading a Release
To download a release of llama.cpp
and build it from source you can use the CLI source download
command.
npx --no node-llama-cpp source download
NOTE
node-llama-cpp
ships with a git bundle of the release of llama.cpp
it was built with, so when you run the source download
command without specifying a specific release or repo, it will use the bundled git bundle instead of downloading the release from GitHub.
This is useful for building from source on machines that aren't connected to the internet.
INFO
If cmake
is not installed on your machine, node-llama-cpp
will automatically download cmake
to an internal directory and try to use it to build llama.cpp
from source.
If the build fails, make sure you have the required dependencies of cmake
installed on your machine. More info is available here (you don't have to install cmake
or cmake-js
, just the dependencies).
If the build fails on macOS with the error "/usr/bin/cc" is not able to compile a simple test program
, try running xcode-select --install
to install the Xcode command line tools.
source download
and source build
Commands
The difference between the source download
and source build
commands is that the source download
command downloads a release of llama.cpp
and builds it, while the source build
command builds the llama.cpp
release that's already downloaded.
You can only use the source build
command after you've already downloaded a release of llama.cpp
with the source download
command.
To only download a release of llama.cpp
without building it, use the source download
command with the --skipBuild
option:
npx --no node-llama-cpp source download --skipBuild
Building Inside Your App
The best way to use a customized build is by customizing the options passed to the getLlama
.
If there's no existing binary that matches the provided options (either a local build or a pre-built binary), it'll automatically download a release of llama.cpp
(if it's not already downloaded) and build it from source.
You can pass custom cmake options you want the binary be compiled with by using the cmakeOptions
option:
const llama = await getLlama({
cmakeOptions: {
OPTION_NAME: "OPTION_VALUE"
},
// force a build if the pre-built binary doesn't
// match all the provided options, such as the cmakeOptions
existingPrebuiltBinaryMustMatchBuildOptions: true
});
You can also force it to build a new binary by setting the build
option to "forceRebuild"
:
const llama = await getLlama({
build: "forceRebuild"
});
Electron support for building from source
When running in Electron, the build
option defaults to "never"
as we cannot assume that the user has the necessary build tools installed on their machine, and the user won't be able to see the build process to troubleshoot any issues that may arise.
You can manually set it to be "auto"
to allow building from source in Electron.
When running from inside an Asar archive in Electron, building from source is not possible, so it'll never build from source. To allow building from source in Electron apps, make sure you ship node-llama-cpp
as an unpacked module.
If you want to use a build with custom cmake options in your Electron app, make sure you build node-llama-cpp
with your desired cmake options before building your Electron app, and make sure you pass the same cmake options to the getLlama
function in your Electron app so it'll use the binary you built.
Customizing the Build
Meta: To configure Metal support see the Metal support guide.
CUDA: To configure CUDA support see the CUDA support guide.
Vulkan: To configure Vulkan support see the Vulkan support guide.
llama.cpp
has CMake build options that can be configured to customize the build.
llama.cpp
CMake build options
Option | Description | Default value |
---|---|---|
BUILD_SHARED_LIBS |
ggml: build shared libraries | OFF on MinGW, ON otherwise |
GGML_BACKEND_DL |
ggml: build backends as dynamic libraries (requires BUILD_SHARED_LIBS) | OFF |
GGML_STATIC |
ggml: static link libraries | OFF |
GGML_NATIVE |
ggml: enable -march=native flag | ${GGML_NATIVE_DEFAULT} |
GGML_LTO |
ggml: enable link time optimization | OFF |
GGML_CCACHE |
ggml: use ccache if available | ON |
GGML_ALL_WARNINGS |
ggml: enable all compiler warnings | ON |
GGML_ALL_WARNINGS_3RD_PARTY |
ggml: enable all compiler warnings in 3rd party libs | OFF |
GGML_GPROF |
ggml: enable gprof | OFF |
GGML_FATAL_WARNINGS |
ggml: enable -Werror flag | OFF |
GGML_SANITIZE_THREAD |
ggml: enable thread sanitizer | OFF |
GGML_SANITIZE_ADDRESS |
ggml: enable address sanitizer | OFF |
GGML_SANITIZE_UNDEFINED |
ggml: enable undefined sanitizer | OFF |
GGML_CPU_HBM |
ggml: use memkind for CPU HBM | OFF |
GGML_CPU_AARCH64 |
ggml: use runtime weight conversion of Q4_0 to Q4_X_X | ON |
GGML_AVX |
ggml: enable AVX | ${INS_ENB} |
GGML_AVX_VNNI |
ggml: enable AVX-VNNI | OFF |
GGML_AVX2 |
ggml: enable AVX2 | ${INS_ENB} |
GGML_AVX512 |
ggml: enable AVX512 | OFF |
GGML_AVX512_VBMI |
ggml: enable AVX512-VBMI | OFF |
GGML_AVX512_VNNI |
ggml: enable AVX512-VNNI | OFF |
GGML_AVX512_BF16 |
ggml: enable AVX512-BF16 | OFF |
GGML_AMX_TILE |
ggml: enable AMX-TILE | OFF |
GGML_AMX_INT8 |
ggml: enable AMX-INT8 | OFF |
GGML_AMX_BF16 |
ggml: enable AMX-BF16 | OFF |
GGML_FMA |
ggml: enable FMA | ${INS_ENB} |
GGML_LASX |
ggml: enable lasx | ON |
GGML_LSX |
ggml: enable lsx | ON |
GGML_RVV |
ggml: enable rvv | ON |
GGML_SVE |
ggml: enable SVE | OFF |
GGML_CPU |
ggml: enable CPU backend | ON |
GGML_ACCELERATE |
ggml: enable Accelerate framework | ON |
GGML_BLAS |
ggml: use BLAS | ${GGML_BLAS_DEFAULT} |
GGML_CUDA |
ggml: use CUDA | OFF |
GGML_MUSA |
ggml: use MUSA | OFF |
GGML_CUDA_FORCE_MMQ |
ggml: use mmq kernels instead of cuBLAS | OFF |
GGML_CUDA_FORCE_CUBLAS |
ggml: always use cuBLAS instead of mmq kernels | OFF |
GGML_CUDA_F16 |
ggml: use 16 bit floats for some calculations | OFF |
GGML_CUDA_NO_PEER_COPY |
ggml: do not use peer to peer copies | OFF |
GGML_CUDA_NO_VMM |
ggml: do not try to use CUDA VMM | OFF |
GGML_CUDA_FA_ALL_QUANTS |
ggml: compile all quants for FlashAttention | OFF |
GGML_CUDA_GRAPHS |
ggml: use CUDA graphs (llama.cpp only) | ${GGML_CUDA_GRAPHS_DEFAULT} |
GGML_HIP |
ggml: use HIP | OFF |
GGML_HIP_UMA |
ggml: use HIP unified memory architecture | OFF |
GGML_VULKAN |
ggml: use Vulkan | OFF |
GGML_VULKAN_CHECK_RESULTS |
ggml: run Vulkan op checks | OFF |
GGML_VULKAN_DEBUG |
ggml: enable Vulkan debug output | OFF |
GGML_VULKAN_MEMORY_DEBUG |
ggml: enable Vulkan memory debug output | OFF |
GGML_VULKAN_SHADER_DEBUG_INFO |
ggml: enable Vulkan shader debug info | OFF |
GGML_VULKAN_PERF |
ggml: enable Vulkan perf output | OFF |
GGML_VULKAN_VALIDATE |
ggml: enable Vulkan validation | OFF |
GGML_VULKAN_RUN_TESTS |
ggml: run Vulkan tests | OFF |
GGML_KOMPUTE |
ggml: use Kompute | OFF |
GGML_METAL |
ggml: use Metal | ON on macOS on Apple Silicon, OFF otherwise |
GGML_METAL_USE_BF16 |
ggml: use bfloat if available | OFF |
GGML_METAL_NDEBUG |
ggml: disable Metal debugging | OFF |
GGML_METAL_SHADER_DEBUG |
ggml: compile Metal with -fno-fast-math | OFF |
GGML_METAL_EMBED_LIBRARY |
ggml: embed Metal library | ON on macOS, OFF otherwise |
GGML_OPENMP |
ggml: use OpenMP | ON |
GGML_SYCL |
ggml: use SYCL | OFF |
GGML_SYCL_F16 |
ggml: use 16 bit floats for sycl calculations | OFF |
Source:
CMakeLists
To build node-llama-cpp
with any of these options, set an environment variable of an option prefixed with NODE_LLAMA_CPP_CMAKE_OPTION_
before running the source download
or source build
commands.
To use that customized build in your code, you can either use getLlama("lastBuild")
to get the last build that was built, or pass the code snippet that is printed after the build finishes.
Downloading a Newer Release
Every new release of node-llama-cpp
ships with the latest release of llama.cpp
that was available at the time of the release, so relying on the latest version of node-llama-cpp
should be enough for most use cases.
However, you may want to download a newer release of llama.cpp
(llama.cpp
releases) and build it from source to get the latest features and bug fixes before a new version of node-llama-cpp
is released.
A new release may contain breaking changes, so it won't necessarily work properly or even compile at all, so do this with caution.
You can do this by specifying the --release
option with the release tag you want to download:
npx --no node-llama-cpp source download --release "b1350"
You can find the release tag on the
llama.cpp
releases page:
You can also opt to download the latest release available:
npx --no node-llama-cpp source download --release latest