Using node-llama-cpp
in Docker
When using node-llama-cpp
in a docker image to run it with Docker or Podman, you will most likely want to use it together with a GPU for fast inference.
For that, you'll have to:
- Configure support for your GPU on the host machine
- Build an image with the necessary GPU libraries
- Enable GPU support when running the container
Configuring the Host Machine
Metal: Using Metal in of a docker container is not supported.
CUDA: You need to install the NVIDIA Container Toolkit on the host machine to use NVIDIA GPUs.
Vulkan: You need to install the relevant GPU drives on the host machine, and configure Docker or Podman to use them.
No GPU (CPU only): No special configuration is needed.
Building an Image
WARNING
Do not attempt to use alpine
as the base image as it doesn't work well with many GPU drivers.
The potential image size savings of using alpine
images are not worth the hassle, especially considering that the models files you use will likely be much larger than the image itself anyway.
FROM node:22
# Replace `x86_64` with `sbsa` for ARM64
ENV NVARCH=x86_64
ENV INSTALL_CUDA_VERSION=12.5
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends gnupg2 curl ca-certificates && \
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH}/3bf863cc.pub | apt-key add - && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
apt-get purge --autoremove -y curl && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \
"cuda-cudart-${INSTALL_CUDA_VERSION//./-}" \
"cuda-compat-${INSTALL_CUDA_VERSION//./-}" \
"cuda-libraries-${INSTALL_CUDA_VERSION//./-}" \
"libnpp-${INSTALL_CUDA_VERSION//./-}" \
"cuda-nvtx-${INSTALL_CUDA_VERSION//./-}" \
"libcusparse-${INSTALL_CUDA_VERSION//./-}" \
"libcublas-${INSTALL_CUDA_VERSION//./-}" \
git cmake clang libgomp1 \
&& rm -rf /var/lib/apt/lists/*
RUN apt-mark hold "libcublas-${INSTALL_CUDA_VERSION//./-}"
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
&& echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm start
FROM node:22
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends mesa-vulkan-drivers libegl1 git cmake clang libgomp1 && \
rm -rf /var/lib/apt/lists/*
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm start
FROM node:22
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install -y --no-install-recommends git cmake clang libgomp1 && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/app
WORKDIR /opt/app
COPY . /opt/app
RUN npm ci
CMD npm start
Running the Container
To run the container with GPU support, use the following:
docker run --rm -it --gpus=all my-image:tag
podman run --rm -it --gpus=all my-image:tag
services:
my-service:
image: my-image:tag
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
count: all
When using the CLI, you can test the GPU support by running this command
docker run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpu
podman run --rm -it --gpus=all my-image:tag npx -y node-llama-cpp inspect gpu
Troubleshooting
NVIDIA GPU Is Not Recognized by the Vulkan Driver Inside the Container
Make sure your Docker/Podman configuration has an nvidia
runtime:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk cdi list
And then run the container with the nvidia
runtime:
docker run --rm -it --runtime=nvidia --gpus=all my-image:tag
podman run --rm -it --device nvidia.com/gpu=all --security-opt=label=disable --gpus=all my-image:tag
Getting an system has unsupported display driver / cuda driver combination
Error
Ensure that the INSTALL_CUDA_VERSION
in the Dockerfile matches or is older than the CUDA version installed on the host machine.
You can check what is the installed CUDA version using
nvidia-smi --version
.