Building From Source

node-llama-cpp ships with pre-built binaries for macOS, Linux and Windows.

In case binaries are not available for your platform or fail to load, it'll fallback to download a release of llama.cpp and build it from source with cmake.

Downloading a Release

To download a release of llama.cpp and build it from source you can use the CLI source download command.

shell

npx --no node-llama-cpp source download

NOTE

node-llama-cpp ships with a git bundle of the release of llama.cpp it was built with, so when you run the source download command without specifying a specific release or repo, it will use the bundled git bundle instead of downloading the release from GitHub.

This is useful for building from source on machines that aren't connected to the internet.

INFO

If cmake is not installed on your machine, node-llama-cpp will automatically download cmake to an internal directory and try to use it to build llama.cpp from source.

If the build fails, make sure you have the required dependencies of cmake installed on your machine. More info is available here (you don't have to install cmake or cmake-js, just the dependencies).

Dependencies for macOS

If the build fails on macOS with the error "/usr/bin/cc" is not able to compile a simple test program, try running this command to install the Xcode command line tools:

shell

xcode-select --install

Dependencies for Windows x64

If the build fails on your machine, ensure you have all the necessary build tools installed.

You can install all the dependencies via WinGet using this command:

shell

winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--add Microsoft.VisualStudio.Component.VC.CMake.Project Microsoft.VisualStudio.Component.VC.CoreBuildTools Microsoft.VisualStudio.Component.VC.Tools.x86.x64 Microsoft.VisualStudio.Component.VC.ATL Microsoft.VisualStudio.Component.VC.ATLMFC Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset Microsoft.VisualStudio.Component.VC.Llvm.Clang Microsoft.VisualStudio.Component.VC.Redist.14.Latest Microsoft.Component.VC.Runtime.UCRTSDK Microsoft.VisualStudio.Component.Windows10SDK Microsoft.VisualStudio.Component.Windows10SDK.20348"

WinGet is built-in on Windows 11 and modern Windows 10 versions

You can also install all the dependencies manually using the Visual C++ Build Tools installer:

Workloads tab: select Desktop development with C++
Individual components tab: select the following:
- C++ ATL for latest v143 build tools (x86 & x64)
- C++ MFC for latest v143 build tools (x86 & x64)
- C++ CMake tools for Windows
- C++ Clang Compiler for Windows
- MSBuild support for LLVM (clang-cl) toolset
- Windows Universal CRT SDK

Dependencies for Windows on Arm

On Windows on Arm you need to install additional build tools to build llama.cpp from source.

You can install all the dependencies via WinGet using this command:

shell

winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--add Microsoft.VisualStudio.Component.VC.CMake.Project Microsoft.VisualStudio.Component.VC.CoreBuildTools Microsoft.VisualStudio.Component.VC.Tools.x86.x64 Microsoft.VisualStudio.Component.VC.Tools.ARM64 Microsoft.VisualStudio.Component.VC.ATL Microsoft.VisualStudio.Component.VC.ATL.ARM64 Microsoft.VisualStudio.Component.VC.ATLMFC Microsoft.VisualStudio.Component.VC.MFC.ARM64 Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset Microsoft.VisualStudio.Component.VC.Llvm.Clang Microsoft.VisualStudio.Component.VC.Redist.14.Latest Microsoft.Component.VC.Runtime.UCRTSDK Microsoft.VisualStudio.Component.Windows10SDK Microsoft.VisualStudio.Component.Windows10SDK.20348"

WinGet is built-in on Windows 11 and modern Windows 10 versions

You can also install all the dependencies manually using the Visual C++ Build Tools installer:

Workloads tab: select Desktop development with C++
Individual components tab: select the following:
- MSVC v143 - VS 2022 C++ ARM64 build tools (latest)
- C++ ATL for latest v143 build tools (ARM64/ARM64EC)
- C++ MFC for latest v143 build tools (ARM64/ARM64EC)
- C++ CMake tools for Windows
- C++ Clang Compiler for Windows
- MSBuild support for LLVM (clang-cl) toolset
- Windows Universal CRT SDK

`source download` and `source build` Commands

The difference between the source download and source build commands is that the source download command downloads a release of llama.cpp and builds it, while the source build command builds the llama.cpp release that's already downloaded.

You can only use the source build command after you've already downloaded a release of llama.cpp with the source download command.

To only download a release of llama.cpp without building it, use the source download command with the --skipBuild option:

shell

npx --no node-llama-cpp source download --skipBuild

Building Inside Your App

The best way to use a customized build is by customizing the options passed to the getLlama.

If there's no existing binary that matches the provided options (either a local build or a pre-built binary), it'll automatically download a release of llama.cpp (if it's not already downloaded) and build it from source.

You can pass custom cmake options you want the binary be compiled with by using the cmakeOptions option:

typescript

const llama
 = await getLlama
({
    cmakeOptions
: {
        OPTION_NAME
: "OPTION_VALUE"
    },
    
    // force a build if the pre-built binary doesn't
    // match all the provided options, such as the cmakeOptions
    existingPrebuiltBinaryMustMatchBuildOptions
: true
});

You can also force it to build a new binary by setting the build option to "forceRebuild":

typescript

const llama
 = await getLlama
({
    build
: "forceRebuild"
});

Electron support for building from source

When running in Electron, the build option defaults to "never" as we cannot assume that the user has the necessary build tools installed on their machine, and the user won't be able to see the build process to troubleshoot any issues that may arise.

You can manually set it to be "auto" to allow building from source in Electron.

When running from inside an Asar archive in Electron, building from source is not possible, so it'll never build from source. To allow building from source in Electron apps, make sure you ship node-llama-cpp as an unpacked module.

If you want to use a build with custom cmake options in your Electron app, make sure you build node-llama-cpp with your desired cmake options before building your Electron app, and make sure you pass the same cmake options to the getLlama function in your Electron app so it'll use the binary you built.

Customizing the Build

Meta: To configure Metal support see the Metal support guide.
CUDA: To configure CUDA support see the CUDA support guide.
Vulkan: To configure Vulkan support see the Vulkan support guide.

llama.cpp has CMake build options that can be configured to customize the build.

llama.cpp CMake build options

Option	Description	Default value
`GGML_STATIC`	ggml: static link libraries	`OFF`
`GGML_NATIVE`	ggml: optimize the build for the current system	`OFF` when building for a different architecture, `ON` otherwise
`GGML_LTO`	ggml: enable link time optimization	`OFF`
`GGML_CCACHE`	ggml: use ccache if available	`ON`
`GGML_ALL_WARNINGS`	ggml: enable all compiler warnings	`ON`
`GGML_ALL_WARNINGS_3RD_PARTY`	ggml: enable all compiler warnings in 3rd party libs	`OFF`
`GGML_GPROF`	ggml: enable gprof	`OFF`
`GGML_FATAL_WARNINGS`	ggml: enable -Werror flag	`OFF`
`GGML_SANITIZE_THREAD`	ggml: enable thread sanitizer	`OFF`
`GGML_SANITIZE_ADDRESS`	ggml: enable address sanitizer	`OFF`
`GGML_SANITIZE_UNDEFINED`	ggml: enable undefined sanitizer	`OFF`
`GGML_CPU_HBM`	ggml: use memkind for CPU HBM	`OFF`
`GGML_CPU_HBM`	ggml: use memkind for CPU HBM	`OFF`
`GGML_CPU_AARCH64`	ggml: use runtime weight conversion of Q4_0 to Q4_X_X	`ON`
`GGML_CPU_KLEIDIAI`	ggml: use KleidiAI optimized kernels if applicable	`OFF`
`GGML_SSE42`	ggml: enable SSE 4.2	`${INS_ENB}`
`GGML_AVX`	ggml: enable AVX	`${INS_ENB}`
`GGML_AVX_VNNI`	ggml: enable AVX-VNNI	`OFF`
`GGML_AVX2`	ggml: enable AVX2	`${INS_ENB}`
`GGML_BMI2`	ggml: enable BMI2	`${INS_ENB}`
`GGML_AVX512`	ggml: enable AVX512F	`OFF`
`GGML_AVX512_VBMI`	ggml: enable AVX512-VBMI	`OFF`
`GGML_AVX512_VNNI`	ggml: enable AVX512-VNNI	`OFF`
`GGML_AVX512_BF16`	ggml: enable AVX512-BF16	`OFF`
`GGML_FMA`	ggml: enable FMA	`${INS_ENB}`
`GGML_F16C`	ggml: enable F16C	`${INS_ENB}`
`GGML_AMX_TILE`	ggml: enable AMX-TILE	`OFF`
`GGML_AMX_INT8`	ggml: enable AMX-INT8	`OFF`
`GGML_AMX_BF16`	ggml: enable AMX-BF16	`OFF`
`GGML_LASX`	ggml: enable lasx	`ON`
`GGML_LSX`	ggml: enable lsx	`ON`
`GGML_RVV`	ggml: enable rvv	`ON`
`GGML_RV_ZFH`	ggml: enable riscv zfh	`OFF`
`GGML_VXE`	ggml: enable vxe	`ON`
`GGML_CPU_ALL_VARIANTS`	ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)	`OFF`
`GGML_CPU_ALL_VARIANTS`	ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)	`OFF`
`GGML_CPU`	ggml: enable CPU backend	`ON`
`GGML_ACCELERATE`	ggml: enable Accelerate framework	`ON`
`GGML_BLAS`	ggml: use BLAS	`ON` on macOS, `OFF` otherwise
`GGML_CUDA`	ggml: use CUDA	`OFF`
`GGML_CUDA`	ggml: use CUDA	`OFF`
`GGML_MUSA`	ggml: use MUSA	`OFF`
`GGML_CUDA_FORCE_MMQ`	ggml: use mmq kernels instead of cuBLAS	`OFF`
`GGML_CUDA_FORCE_CUBLAS`	ggml: always use cuBLAS instead of mmq kernels	`OFF`
`GGML_CUDA_F16`	ggml: use 16 bit floats for some calculations	`OFF`
`GGML_CUDA_NO_PEER_COPY`	ggml: do not use peer to peer copies	`OFF`
`GGML_CUDA_NO_VMM`	ggml: do not try to use CUDA VMM	`OFF`
`GGML_CUDA_FA`	ggml: compile ggml FlashAttention CUDA kernels	`ON`
`GGML_CUDA_FA_ALL_QUANTS`	ggml: compile all quants for FlashAttention	`OFF`
`GGML_CUDA_GRAPHS`	ggml: use CUDA graphs (llama.cpp only)	`ON`
`GGML_HIP`	ggml: use HIP	`OFF`
`GGML_HIP`	ggml: use HIP	`OFF`
`GGML_HIP_GRAPHS`	ggml: use HIP graph, experimental, slow	`OFF`
`GGML_HIP_NO_VMM`	ggml: do not try to use HIP VMM	`ON`
`GGML_HIP_ROCWMMA_FATTN`	ggml: enable rocWMMA for FlashAttention	`OFF`
`GGML_VULKAN`	ggml: use Vulkan	`OFF`
`GGML_VULKAN_CHECK_RESULTS`	ggml: run Vulkan op checks	`OFF`
`GGML_VULKAN_DEBUG`	ggml: enable Vulkan debug output	`OFF`
`GGML_VULKAN_MEMORY_DEBUG`	ggml: enable Vulkan memory debug output	`OFF`
`GGML_VULKAN_SHADER_DEBUG_INFO`	ggml: enable Vulkan shader debug info	`OFF`
`GGML_VULKAN_PERF`	ggml: enable Vulkan perf output	`OFF`
`GGML_VULKAN_VALIDATE`	ggml: enable Vulkan validation	`OFF`
`GGML_VULKAN_RUN_TESTS`	ggml: run Vulkan tests	`OFF`
`GGML_KOMPUTE`	ggml: use Kompute	`OFF`
`GGML_METAL`	ggml: use Metal	`ON` on macOS on Apple Silicon, `OFF` otherwise
`GGML_METAL_USE_BF16`	ggml: use bfloat if available	`OFF`
`GGML_METAL_NDEBUG`	ggml: disable Metal debugging	`OFF`
`GGML_METAL_SHADER_DEBUG`	ggml: compile Metal with -fno-fast-math	`OFF`
`GGML_METAL_EMBED_LIBRARY`	ggml: embed Metal library	`ON` on macOS, `OFF` otherwise
`GGML_OPENMP`	ggml: use OpenMP	`ON`
`GGML_SYCL`	ggml: use SYCL	`OFF`
`GGML_SYCL_F16`	ggml: use 16 bit floats for sycl calculations	`OFF`
`GGML_SYCL_GRAPH`	ggml: enable graphs in the SYCL backend	`ON`
`GGML_SYCL_DNN`	ggml: enable oneDNN in the SYCL backend	`ON`
`GGML_OPENCL`	ggml: use OpenCL	`OFF`
`GGML_OPENCL`	ggml: use OpenCL	`OFF`
`GGML_OPENCL_PROFILING`	ggml: use OpenCL profiling (increases overhead)	`OFF`
`GGML_OPENCL_EMBED_KERNELS`	ggml: embed kernels	`ON`
`GGML_OPENCL_USE_ADRENO_KERNELS`	ggml: use optimized kernels for Adreno	`ON`

Source: CMakeLists

To build node-llama-cpp with any of these options, set an environment variable of an option prefixed with NODE_LLAMA_CPP_CMAKE_OPTION_ before running the source download or source build commands.

To use that customized build in your code, you can either use getLlama("lastBuild") to get the last build that was built, or pass the code snippet that is printed after the build finishes.

Downloading a Newer Release

Every new release of node-llama-cpp ships with the latest release of llama.cpp that was available at the time of the release, so relying on the latest version of node-llama-cpp should be enough for most use cases.

However, you may want to download a newer release of llama.cpp (llama.cpp releases) and build it from source to get the latest features and bug fixes before a new version of node-llama-cpp is released.

A new release may contain breaking changes, so it won't necessarily work properly or even compile at all, so do this with caution.

You can do this by specifying the --release option with the release tag you want to download:

shell

npx --no node-llama-cpp source download --release "b1350"

You can find the release tag on the llama.cpp releases page:

You can also opt to download the latest release available:

shell

npx --no node-llama-cpp source download --release latest

Last edited 3 months agoView full history

Building From Source ​

Downloading a Release ​

source download and source build Commands ​

Building Inside Your App ​

Customizing the Build ​

Downloading a Newer Release ​

Building From Source

Downloading a Release

`source download` and `source build` Commands

Building Inside Your App

Customizing the Build

Downloading a Newer Release