Using node-llama-cpp in Docker
When using node-llama-cpp in a docker image to run it with Docker or Podman, you will most likely want to use it together with a GPU for fast inference.
For that, you'll have to:
- Configure support for your GPU on the host machine
- Build an image with the necessary GPU libraries
- Enable GPU support when running the container
Configuring the Host Machine
Metal: Using Metal in of a docker container is not supported.
CUDA: You need to install the NVIDIA Container Toolkit on the host machine to use NVIDIA GPUs.
Vulkan: You need to install the relevant GPU drives on the host machine, and configure Docker or Podman to use them.
No GPU (CPU only): No special configuration is needed.
Building an Image
WARNING
Do not attempt to use alpine as the base image as it doesn't work well with many GPU drivers.
The potential image size savings of using alpine images are not worth the hassle, especially considering that the models files you use will likely be much larger than the image itself anyway.
FROM node:22
# Replace `x86_64` with `sbsa` for ARM64
ENV NVARCH=x86_64
ENV INSTALL_CUDA_VERSION=12.5
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends gnupg2 curl ca-certificates && \
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH}/3bf863cc.pub | apt-key add - && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
apt-get purge --autoremove -y curl && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \
"cuda-cudart-${INSTALL_CUDA_VERSION//./-}" \
"cuda-compat-${INSTALL_CUDA_VERSION//./-}" \
"cuda-libraries-${INSTALL_CUDA_VERSION//./-}" \
"libnpp-${INSTALL_CUDA_VERSION//./-}" \
"cuda-nvtx-${INSTALL_CUDA_VERSION//./-}" \
"libcusparse-${INSTALL_CUDA_VERSION//./-}" \
"libcublas-${INSTALL_CUDA_VERSION//./-}" \
git cmake clang libgomp1 \
&& rm -rf /var/lib/apt/lists/*
RUN apt-mark hold "libcublas-${INSTALL_CUDA_VERSION//./-}"
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
&& echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm startFROM node:22
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends mesa-vulkan-drivers libegl1 git cmake clang libgomp1 && \
rm -rf /var/lib/apt/lists/*
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm startFROM node:22
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends git cmake clang libgomp1 && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm startRunning the Container
To run the container with GPU support, use the following:
docker run --rm -it --gpus=all my-image:tagpodman run --rm -it --gpus=all my-image:tagservices:
my-service:
image: my-image:tag
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
count: allWhen using the CLI, you can test the GPU support by running this command
docker run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpupodman run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpuTroubleshooting
NVIDIA GPU Is Not Recognized by the Vulkan Driver Inside the Container
Make sure your Docker/Podman configuration has an nvidia runtime:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk cdi listAnd then run the container with the nvidia runtime:
docker run --rm -it --runtime=nvidia --gpus=all my-image:tagpodman run --rm -it --device nvidia.com/gpu=all --security-opt=label=disable --gpus=all my-image:tagGetting an system has unsupported display driver / cuda driver combination Error
Ensure that the INSTALL_CUDA_VERSION in the Dockerfile matches or is older than the CUDA version installed on the host machine.
You can check what is the installed CUDA version using
nvidia-smi --version.
