Skip to content

Building From Source

node-llama-cpp ships with pre-built binaries for macOS, Linux and Windows.

In case binaries are not available for your platform or fail to load, it'll fallback to download a release of llama.cpp and build it from source with cmake.

Downloading a Release

To download a release of llama.cpp and build it from source you can use the CLI source download command.

shell
npx --no node-llama-cpp source download

NOTE

node-llama-cpp ships with a git bundle of the release of llama.cpp it was built with, so when you run the source download command without specifying a specific release or repo, it will use the bundled git bundle instead of downloading the release from GitHub.

This is useful for building from source on machines that aren't connected to the internet.

INFO

If cmake is not installed on your machine, node-llama-cpp will automatically download cmake to an internal directory and try to use it to build llama.cpp from source.

If the build fails, make sure you have the required dependencies of cmake installed on your machine. More info is available here (you don't have to install cmake or cmake-js, just the dependencies).

If the build fails on macOS with the error "/usr/bin/cc" is not able to compile a simple test program, try running xcode-select --install to install the Xcode command line tools.

source download and source build Commands

The difference between the source download and source build commands is that the source download command downloads a release of llama.cpp and builds it, while the source build command builds the llama.cpp release that's already downloaded.

You can only use the source build command after you've already downloaded a release of llama.cpp with the source download command.

To only download a release of llama.cpp without building it, use the source download command with the --skipBuild option:

shell
npx --no node-llama-cpp source download --skipBuild

Building Inside Your App

The best way to use a customized build is by customizing the options passed to the getLlama.

If there's no existing binary that matches the provided options (either a local build or a pre-built binary), it'll automatically download a release of llama.cpp (if it's not already downloaded) and build it from source.

You can pass custom cmake options you want the binary be compiled with by using the cmakeOptions option:

typescript
const 
llama
= await
getLlama
({
cmakeOptions
: {
OPTION_NAME
: "OPTION_VALUE"
}, // force a build if the pre-built binary doesn't // match all the provided options, such as the cmakeOptions
existingPrebuiltBinaryMustMatchBuildOptions
: true
});

You can also force it to build a new binary by setting the build option to "forceRebuild":

typescript
const 
llama
= await
getLlama
({
build
: "forceRebuild"
});

Electron support for building from source

When running in Electron, the build option defaults to "never" as we cannot assume that the user has the necessary build tools installed on their machine, and the user won't be able to see the build process to troubleshoot any issues that may arise.

You can manually set it to be "auto" to allow building from source in Electron.

When running from inside an Asar archive in Electron, building from source is not possible, so it'll never build from source. To allow building from source in Electron apps, make sure you ship node-llama-cpp as an unpacked module.

If you want to use a build with custom cmake options in your Electron app, make sure you build node-llama-cpp with your desired cmake options before building your Electron app, and make sure you pass the same cmake options to the getLlama function in your Electron app so it'll use the binary you built.

Customizing the Build

Meta: To configure Metal support see the Metal support guide.

CUDA: To configure CUDA support see the CUDA support guide.

Vulkan: To configure Vulkan support see the Vulkan support guide.

llama.cpp has CMake build options that can be configured to customize the build.

llama.cpp CMake build options
Option Description Default value
BUILD_SHARED_LIBS ggml: build shared libraries OFF on MinGW, ON otherwise
GGML_BACKEND_DL ggml: build backends as dynamic libraries (requires BUILD_SHARED_LIBS) OFF
GGML_STATIC ggml: static link libraries OFF
GGML_NATIVE ggml: enable -march=native flag ${GGML_NATIVE_DEFAULT}
GGML_LTO ggml: enable link time optimization OFF
GGML_CCACHE ggml: use ccache if available ON
GGML_ALL_WARNINGS ggml: enable all compiler warnings ON
GGML_ALL_WARNINGS_3RD_PARTY ggml: enable all compiler warnings in 3rd party libs OFF
GGML_GPROF ggml: enable gprof OFF
GGML_FATAL_WARNINGS ggml: enable -Werror flag OFF
GGML_SANITIZE_THREAD ggml: enable thread sanitizer OFF
GGML_SANITIZE_ADDRESS ggml: enable address sanitizer OFF
GGML_SANITIZE_UNDEFINED ggml: enable undefined sanitizer OFF
GGML_CPU_HBM ggml: use memkind for CPU HBM OFF
GGML_CPU_AARCH64 ggml: use runtime weight conversion of Q4_0 to Q4_X_X ON
GGML_AVX ggml: enable AVX ${INS_ENB}
GGML_AVX_VNNI ggml: enable AVX-VNNI OFF
GGML_AVX2 ggml: enable AVX2 ${INS_ENB}
GGML_AVX512 ggml: enable AVX512 OFF
GGML_AVX512_VBMI ggml: enable AVX512-VBMI OFF
GGML_AVX512_VNNI ggml: enable AVX512-VNNI OFF
GGML_AVX512_BF16 ggml: enable AVX512-BF16 OFF
GGML_AMX_TILE ggml: enable AMX-TILE OFF
GGML_AMX_INT8 ggml: enable AMX-INT8 OFF
GGML_AMX_BF16 ggml: enable AMX-BF16 OFF
GGML_FMA ggml: enable FMA ${INS_ENB}
GGML_LASX ggml: enable lasx ON
GGML_LSX ggml: enable lsx ON
GGML_RVV ggml: enable rvv ON
GGML_SVE ggml: enable SVE OFF
GGML_CPU ggml: enable CPU backend ON
GGML_ACCELERATE ggml: enable Accelerate framework ON
GGML_BLAS ggml: use BLAS ${GGML_BLAS_DEFAULT}
GGML_CUDA ggml: use CUDA OFF
GGML_MUSA ggml: use MUSA OFF
GGML_CUDA_FORCE_MMQ ggml: use mmq kernels instead of cuBLAS OFF
GGML_CUDA_FORCE_CUBLAS ggml: always use cuBLAS instead of mmq kernels OFF
GGML_CUDA_F16 ggml: use 16 bit floats for some calculations OFF
GGML_CUDA_NO_PEER_COPY ggml: do not use peer to peer copies OFF
GGML_CUDA_NO_VMM ggml: do not try to use CUDA VMM OFF
GGML_CUDA_FA_ALL_QUANTS ggml: compile all quants for FlashAttention OFF
GGML_CUDA_GRAPHS ggml: use CUDA graphs (llama.cpp only) ${GGML_CUDA_GRAPHS_DEFAULT}
GGML_HIP ggml: use HIP OFF
GGML_HIP_UMA ggml: use HIP unified memory architecture OFF
GGML_VULKAN ggml: use Vulkan OFF
GGML_VULKAN_CHECK_RESULTS ggml: run Vulkan op checks OFF
GGML_VULKAN_DEBUG ggml: enable Vulkan debug output OFF
GGML_VULKAN_MEMORY_DEBUG ggml: enable Vulkan memory debug output OFF
GGML_VULKAN_SHADER_DEBUG_INFO ggml: enable Vulkan shader debug info OFF
GGML_VULKAN_PERF ggml: enable Vulkan perf output OFF
GGML_VULKAN_VALIDATE ggml: enable Vulkan validation OFF
GGML_VULKAN_RUN_TESTS ggml: run Vulkan tests OFF
GGML_KOMPUTE ggml: use Kompute OFF
GGML_METAL ggml: use Metal ON on macOS on Apple Silicon, OFF otherwise
GGML_METAL_USE_BF16 ggml: use bfloat if available OFF
GGML_METAL_NDEBUG ggml: disable Metal debugging OFF
GGML_METAL_SHADER_DEBUG ggml: compile Metal with -fno-fast-math OFF
GGML_METAL_EMBED_LIBRARY ggml: embed Metal library ON on macOS, OFF otherwise
GGML_OPENMP ggml: use OpenMP ON
GGML_SYCL ggml: use SYCL OFF
GGML_SYCL_F16 ggml: use 16 bit floats for sycl calculations OFF

Source: CMakeLists

To build node-llama-cpp with any of these options, set an environment variable of an option prefixed with NODE_LLAMA_CPP_CMAKE_OPTION_ before running the source download or source build commands.

To use that customized build in your code, you can either use getLlama("lastBuild") to get the last build that was built, or pass the code snippet that is printed after the build finishes.

Downloading a Newer Release

Every new release of node-llama-cpp ships with the latest release of llama.cpp that was available at the time of the release, so relying on the latest version of node-llama-cpp should be enough for most use cases.

However, you may want to download a newer release of llama.cpp (llama.cpp releases) and build it from source to get the latest features and bug fixes before a new version of node-llama-cpp is released.

A new release may contain breaking changes, so it won't necessarily work properly or even compile at all, so do this with caution.

You can do this by specifying the --release option with the release tag you want to download:

shell
npx --no node-llama-cpp source download --release "b1350"

You can find the release tag on the llama.cpp releases page:

You can also opt to download the latest release available:

shell
npx --no node-llama-cpp source download --release latest