From 24eb231d4e31f50761bc16d33cc671b05b5a96bb Mon Sep 17 00:00:00 2001 From: gitlawr Date: Wed, 4 Dec 2024 14:49:03 +0800 Subject: [PATCH] docs: update inference backend --- docs/user-guide/inference-backends.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/user-guide/inference-backends.md b/docs/user-guide/inference-backends.md index 4285007..3c72e25 100644 --- a/docs/user-guide/inference-backends.md +++ b/docs/user-guide/inference-backends.md @@ -4,23 +4,27 @@ GPUStack supports the following inference backends: - llama-box - vLLM +- vox-box When users deploy a model, the backend is selected automatically based on the following criteria: -- If the model is a [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) model, llama-box is used. -- Otherwise, vLLM is used. +- If the model is a [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) model, `llama-box` is used. +- If the model is a known `text-to-speech` or `speech-to-text` model, `vox-box` is used. +- Otherwise, `vLLM` is used. ## llama-box -[llama-box](https://github.com/gpustack/llama-box) is a LLM inference server based on [llama.cpp](https://github.com/ggerganov/llama.cpp). +[llama-box](https://github.com/gpustack/llama-box) is a LM inference server based on [llama.cpp](https://github.com/ggerganov/llama.cpp) and [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). ### Supported Platforms -The llama-box backend works on a wide range of platforms, including MacOS, Linux and Windows(with CPU offloading only on Windows ARM architecture). +The llama-box backend works on a wide range of platforms, including MacOS, Linux and Windows (with CPU offloading only on Windows ARM architecture). ### Supported Models -Please refer to the list of supported models in [README](https://github.com/ggerganov/llama.cpp#description) of llama.cpp project. +- LLMs: For supported LLMs, refer to the llama.cpp [README](https://github.com/ggerganov/llama.cpp#description). +- Difussion Models: Supported models are listed in this [Hugging Face collection](https://huggingface.co/collections/gpustack/image-672dafeb2fa0d02dbe2539a9). +- Reranker Models: Supported models can be found in this [Hugging Face collection](https://huggingface.co/collections/gpustack/reranker-6721a234527f6fcd90deedc4). ### Supported Features