Skip to content

Using node-llama-cpp in Docker

When using node-llama-cpp in a docker image to run it with Docker or Podman, you will most likely want to use it together with a GPU for fast inference.

For that, you'll have to:

  1. Configure support for your GPU on the host machine
  2. Build an image with the necessary GPU libraries
  3. Enable GPU support when running the container

Configuring the Host Machine

Metal: Using Metal in of a docker container is not supported.

CUDA: You need to install the NVIDIA Container Toolkit on the host machine to use NVIDIA GPUs.

Vulkan: You need to install the relevant GPU drives on the host machine, and configure Docker or Podman to use them.

No GPU (CPU only): No special configuration is needed.

Building an Image

WARNING

Do not attempt to use alpine as the base image as it doesn't work well with many GPU drivers.

The potential image size savings of using alpine images are not worth the hassle, especially considering that the models files you use will likely be much larger than the image itself anyway.

Dockerfile
FROM node:22

# Replace `x86_64` with `sbsa` for ARM64
ENV NVARCH=x86_64
ENV INSTALL_CUDA_VERSION=12.6

SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
    apt-get install -y --no-install-recommends gnupg2 curl ca-certificates && \
    curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH}/3bf863cc.pub | apt-key add - && \
    echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
    apt-get purge --autoremove -y curl && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y --no-install-recommends \
    "cuda-cudart-${INSTALL_CUDA_VERSION//./-}" \
    "cuda-compat-${INSTALL_CUDA_VERSION//./-}" \
    "cuda-libraries-${INSTALL_CUDA_VERSION//./-}" \
    "libnpp-${INSTALL_CUDA_VERSION//./-}" \
    "cuda-nvtx-${INSTALL_CUDA_VERSION//./-}" \
    "libcusparse-${INSTALL_CUDA_VERSION//./-}" \
    "libcublas-${INSTALL_CUDA_VERSION//./-}" \
    git cmake clang libgomp1 \
    && rm -rf /var/lib/apt/lists/*

RUN apt-mark hold "libcublas-${INSTALL_CUDA_VERSION//./-}"

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
    && echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all


RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app

RUN npm ci

CMD npm start
Dockerfile
FROM node:22

SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
    apt-get install -y --no-install-recommends mesa-vulkan-drivers libegl1 git cmake clang libgomp1 && \
    rm -rf /var/lib/apt/lists/*

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all


RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app

RUN npm ci

CMD npm start
Dockerfile
FROM node:22

SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
    apt-get install -y --no-install-recommends git cmake clang libgomp1 && \
    rm -rf /var/lib/apt/lists/*


RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app

RUN npm ci

CMD npm start

Running the Container

To run the container with GPU support, use the following:

shell
docker run --rm -it --gpus=all my-image:tag
shell
podman run --rm -it --gpus=all my-image:tag
yaml
services:
  my-service:
    image: my-image:tag
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              count: all

When using the CLI, you can test the GPU support by running this command

shell
docker run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpu
shell
podman run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpu

Troubleshooting

NVIDIA GPU Is Not Recognized by the Vulkan Driver Inside the Container

Make sure your Docker/Podman configuration has an nvidia runtime:

json
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
shell
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk cdi list

And then run the container with the nvidia runtime:

shell
docker run --rm -it --runtime=nvidia --gpus=all my-image:tag
shell
podman run --rm -it --device nvidia.com/gpu=all --security-opt=label=disable --gpus=all my-image:tag