You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gpustack/docs/installation/docker-installation.md

13 KiB

Docker Installation

You can use the official Docker image to run GPUStack in a container. Installation using docker is supported on:

Supported Platforms

  • Linux

Supported Devices

  • NVIDIA GPUs (Compute Capability 6.0 and above)
  • AMD GPUs
  • Ascend NPUs
  • Moore Threads GPUs
  • Hygon DCUs
  • CPUs (AVX2 for x86 or NEON for ARM)

Prerequisites

Run GPUStack with Docker

!!! note

1. **Heterogeneous clusters are supported.**

2. You can set additional flags for the `gpustack start` command by appending them to the docker run command.
For configuration details, please refer to the [CLI Reference](../cli-reference/start.md).

3. You can either use the `--ipc=host` flag or `--shm-size` flag to allow the container to access the hosts shared memory. It is used by vLLM and pyTorch to share data between processes under the hood, particularly for tensor parallel inference.

4. The  `-p 40064-40131:40064-40131` flag is used to ensure connectivity for distributed inference across workers. For more details, please refer to the [port requirements](./installation-requirements.md#port-requirements). You can omit this flag if you don't need distributed inference across workers.

NVIDIA CUDA

Prerequisites

!!! note

When systemd is used to manage the cgroups of the container and it is triggered to reload any Unit files that have references to NVIDIA GPUs (e.g. systemctl daemon-reload), containerized GPU workloads may suddenly lose access to their GPUs.

In GPUStack, GPUs may be lost in the Resources menu, and running `nvidia-smi` inside the GPUStack container may result in the error: `Failed to initialize NVML: Unknown Error`

To prevent [this issue](https://github.com/NVIDIA/nvidia-container-toolkit/issues/48), disabling systemd cgroup management in Docker is required.

Set the parameter "exec-opts": ["native.cgroupdriver=cgroupfs"] in the /etc/docker/daemon.json file and restart docker, such as:

sudo vim /etc/docker/daemon.json
{
  "runtimes": {
    "nvidia": {
      "args": [],
      "path": "nvidia-container-runtime"
    }
  },
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}
sudo systemctl daemon-reload && sudo systemctl restart docker

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    --gpus all \
    --network=host \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack

or

docker run -d --name gpustack \
    --restart=unless-stopped \
    --gpus all \
    -p 80:80 \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack --worker-ip your_host_ip

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    --gpus all \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip

AMD ROCm

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-rocm \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip

Ascend CANN

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker ( Set ASCEND_VISIBLE_DEVICES to the required GPU indices ):

docker run -d --name gpustack \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -e ASCEND_VISIBLE_DEVICES=0 \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-npu \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip

Moore Threads MUSA

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-musa \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip

Hygon DTK

Prerequisites

Refer to this Tutorial.

Run GPUStack

Run the following command to start the GPUStack server and built-in worker:

docker run -d --name gpustack \
    --restart=unless-stopped \
    -p 80:80 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device=/dev/kfd \
    --device=/dev/dri \
    -v /opt/hyhal:/opt/hyhal:ro \
    -v gpustack-data:/var/lib/gpustack \
    gpustack/gpustack:latest-dcu

To retrieve the default admin password, run the following command:

docker exec -it gpustack cat /var/lib/gpustack/initial_admin_password

(Optional) Run the following command to start the GPUStack server without built-in worker:

docker run -d --name gpustack-server \
    --restart=unless-stopped \
    -p 80:80 \
    -v gpustack-server-data:/var/lib/gpustack \
    gpustack/gpustack:latest-cpu \
    --disable-worker

(Optional) Add Worker

To retrieve the token, run the following command on the GPUStack server host:

docker exec -it gpustack-server cat /var/lib/gpustack/token

To start a GPUStack worker and register it with the GPUStack server, run the following command on the current host or another host. Replace your specific URL, token, and IP address accordingly:

docker run -d --name gpustack-worker \
    --restart=unless-stopped \
    -p 10150:10150 \
    -p 40064-40131:40064-40131 \
    --ipc=host \
    --group-add=video \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v /opt/hyhal:/opt/hyhal:ro \
    -v gpustack-worker-data:/var/lib/gpustack \
    gpustack/gpustack:latest-dcu \
    --server-url http://your_gpustack_url --token your_gpustack_token --worker-ip your_worker_host_ip

Build Your Own Docker Image

For example, the official GPUStack NVIDIA CUDA image is built with CUDA 12.4. If you want to use a different version of CUDA, you can build your own Docker image.

# Example Dockerfile
ARG CUDA_VERSION=12.4.1

FROM nvidia/cuda:$CUDA_VERSION-cudnn-runtime-ubuntu22.04

ARG TARGETPLATFORM
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    git \
    curl \
    wget \
    tzdata \
    iproute2 \
    python3 \
    python3-pip \
    python3-venv \
    && rm -rf /var/lib/apt/lists/*

COPY . /workspace/gpustack
RUN cd /workspace/gpustack && \
    make build

RUN if [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
    # Install vllm dependencies for x86_64
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]"; \
    else  \
    WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]"; \
    fi && \
    pip install pipx && \
    pip install $WHEEL_PACKAGE && \
    pip cache purge && \
    rm -rf /workspace/gpustack

RUN gpustack download-tools

ENTRYPOINT [ "gpustack", "start" ]

Run the following command to build the Docker image:

docker build -t my/gpustack --build-arg CUDA_VERSION=12.0.0 .

For other accelerators, refer to the corresponding Dockerfile in the GPUStack repository.