You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gpustack/docs/overview.md

3.5 KiB

GPUStack

demo

GPUStack is an open-source GPU cluster manager for running large language models(LLMs).

Key Features

  • Supports a Wide Variety of Hardware: Run with different brands of GPUs in Apple MacBooks, Windows PCs, and Linux servers.
  • Scales with Your GPU Inventory: Easily add more GPUs or nodes to scale up your operations.
  • Distributed Inference: Supports both single-node multi-GPU and multi-node inference and serving.
  • Multiple Inference Backends: Supports llama-box (llama.cpp) and vLLM as the inference backend.
  • Lightweight Python Package: Minimal dependencies and operational overhead.
  • OpenAI-compatible APIs: Serve APIs that are compatible with OpenAI standards.
  • User and API key management: Simplified management of users and API keys.
  • GPU metrics monitoring: Monitor GPU performance and utilization in real-time.
  • Token usage and rate metrics: Track token usage and manage rate limits effectively.

Supported Platforms

  • MacOS
  • Windows
  • Linux

The following Linux distributions are verified to work with GPUStack:

Distributions Versions
Ubuntu >= 20.04
Debian >= 11
RHEL >= 8
Rocky >= 8
Fedora >= 36
OpenSUSE >= 15.3 (leap)
OpenEuler >= 22.03

!!! note

The installation of GPUStack worker on a Linux system requires that the GLIBC version be 2.29 or higher.

Supported Accelerators

We plan to support the following accelerators in future releases.

  • AMD ROCm
  • Intel oneAPI
  • MTHREADS MUSA
  • Qualcomm AI Engine

Supported Models

GPUStack uses llama.cpp and vLLM as the backends and supports a wide range of models. Models from the following sources are supported:

  1. Hugging Face

  2. ModelScope

  3. Ollama Library

Example language models:

Example multimodal models:

For full list of supported models, please refer to the supported models section in the inference backends documentation.

OpenAI-Compatible APIs

GPUStack serves OpenAI compatible APIs. For details, please refer to OpenAI Compatible APIs