Compare commits
293 Commits
| Author | SHA1 | Date |
|---|---|---|
|
|
8fe557148d | 4 months ago |
|
|
dfdafd036d | 4 months ago |
|
|
26e81dc700 | 4 months ago |
|
|
b24d08bbf1 | 4 months ago |
|
|
f7636a5f63 | 4 months ago |
|
|
80509b5900 | 4 months ago |
|
|
4535fb182d | 4 months ago |
|
|
e91da52145 | 4 months ago |
|
|
67b93d156e | 4 months ago |
|
|
30533f7275 | 4 months ago |
|
|
e627a4c79e | 4 months ago |
|
|
f7fdcdb9d0 | 4 months ago |
|
|
0d77f69e99 | 4 months ago |
|
|
54d75c7a45 | 5 months ago |
|
|
ea6e5ca9dc | 5 months ago |
|
|
df31850efc | 5 months ago |
|
|
67df35a364 | 5 months ago |
|
|
10eaab6f90 | 5 months ago |
|
|
fac6ed8d25 | 5 months ago |
|
|
7e08098981 | 5 months ago |
|
|
bf6287c069 | 5 months ago |
|
|
e54e269f58 | 5 months ago |
|
|
5dd58c72cb | 5 months ago |
|
|
a6441ff92e | 5 months ago |
|
|
e2c0a7ccdc | 5 months ago |
|
|
cb14580408 | 5 months ago |
|
|
359dc3c6e2 | 5 months ago |
|
|
8ff76ebfee | 5 months ago |
|
|
b5b2272e12 | 5 months ago |
|
|
53d255cba6 | 5 months ago |
|
|
036c40753f | 5 months ago |
|
|
7331059605 | 5 months ago |
|
|
5e22fe99ca | 5 months ago |
|
|
2232e4baf6 | 5 months ago |
|
|
527f7b3644 | 5 months ago |
|
|
35b570459b | 5 months ago |
|
|
a436eff8d3 | 5 months ago |
|
|
6168713804 | 5 months ago |
|
|
309c27a9ef | 5 months ago |
|
|
307dadbbd5 | 5 months ago |
|
|
74c937acc4 | 5 months ago |
|
|
b0a638711b | 5 months ago |
|
|
14a60fd640 | 5 months ago |
|
|
34c21e055f | 5 months ago |
|
|
775fc18d5d | 5 months ago |
|
|
eef43b3426 | 5 months ago |
|
|
976b4ccf1f | 5 months ago |
|
|
b24cb7466d | 5 months ago |
|
|
a035f94425 | 5 months ago |
|
|
0d2a42b932 | 5 months ago |
|
|
be05857963 | 5 months ago |
|
|
f4fc88f114 | 5 months ago |
|
|
fed3bff688 | 5 months ago |
|
|
ffeeb4830d | 5 months ago |
|
|
b193ae59fc | 5 months ago |
|
|
18dd045598 | 5 months ago |
|
|
13f96d7e6f | 5 months ago |
|
|
c97693ec3f | 5 months ago |
|
|
1703ed9af4 | 5 months ago |
|
|
aebae09a10 | 5 months ago |
|
|
17ff2c4b56 | 5 months ago |
|
|
eec744793b | 5 months ago |
|
|
0d4bf13f4c | 5 months ago |
|
|
8118f25941 | 5 months ago |
|
|
9c89d04008 | 5 months ago |
|
|
4f2ae77577 | 5 months ago |
|
|
f5b70329dc | 5 months ago |
|
|
0a28b2a4e0 | 5 months ago |
|
|
f017065122 | 5 months ago |
|
|
bf33757623 | 5 months ago |
|
|
8b8b859fb6 | 5 months ago |
|
|
6ea4988572 | 5 months ago |
|
|
e45a4e9961 | 5 months ago |
|
|
6208c7c3a5 | 5 months ago |
|
|
9b3850fe7c | 5 months ago |
|
|
6c7728d4f3 | 5 months ago |
|
|
feba7ecb51 | 5 months ago |
|
|
c3a34f3f09 | 5 months ago |
|
|
733f2c1482 | 5 months ago |
|
|
a888661621 | 5 months ago |
|
|
65591e10ea | 5 months ago |
|
|
4d30c23d8f | 5 months ago |
|
|
9f9e0217d6 | 5 months ago |
|
|
a16e78dc32 | 5 months ago |
|
|
0aae3fa0a6 | 5 months ago |
|
|
0749e40d1b | 5 months ago |
|
|
b1015f353a | 5 months ago |
|
|
12a4a4ef9f | 5 months ago |
|
|
8872d0687b | 5 months ago |
|
|
17a7c4b6e1 | 5 months ago |
|
|
fc4aa6b6c2 | 5 months ago |
|
|
16b5333fcb | 5 months ago |
|
|
3836fd34bb | 5 months ago |
|
|
d79343b324 | 5 months ago |
|
|
f8cc92ab0b | 5 months ago |
|
|
0c760c34b5 | 5 months ago |
|
|
fef108171c | 5 months ago |
|
|
b0457b7adb | 5 months ago |
|
|
e9f17063ca | 5 months ago |
|
|
57b55a8207 | 5 months ago |
|
|
c2b4b95824 | 5 months ago |
|
|
ba411603ac | 5 months ago |
|
|
1aeb35352a | 5 months ago |
|
|
ba5a49edd3 | 5 months ago |
|
|
2c16096055 | 5 months ago |
|
|
a57b9c1768 | 5 months ago |
|
|
53be75c23c | 5 months ago |
|
|
9787b28a44 | 5 months ago |
|
|
345024182b | 5 months ago |
|
|
b7bfda3b1d | 5 months ago |
|
|
78eda5b83a | 5 months ago |
|
|
70b1f8486d | 5 months ago |
|
|
76bd6838e5 | 5 months ago |
|
|
554f5c0180 | 5 months ago |
|
|
7ddb9edb86 | 5 months ago |
|
|
3fa5bf01fe | 5 months ago |
|
|
1b3a192d07 | 5 months ago |
|
|
152e57f8a7 | 5 months ago |
|
|
089b7fa813 | 5 months ago |
|
|
c7ee3661ac | 5 months ago |
|
|
c52b0c1c44 | 5 months ago |
|
|
4b30a6b743 | 5 months ago |
|
|
e254d7bfd8 | 5 months ago |
|
|
b0c85bb603 | 5 months ago |
|
|
5ffdefdbae | 5 months ago |
|
|
10c2ae6fbc | 5 months ago |
|
|
23c0b77850 | 5 months ago |
|
|
7eca3cabe1 | 5 months ago |
|
|
eef0700bdc | 5 months ago |
|
|
5d65dcc0a7 | 5 months ago |
|
|
a4cdf37379 | 5 months ago |
|
|
a54ff956a1 | 5 months ago |
|
|
91dbbd31af | 5 months ago |
|
|
58af72bf08 | 5 months ago |
|
|
8cb1a38013 | 5 months ago |
|
|
4aea7bcdf9 | 6 months ago |
|
|
b8910817b6 | 6 months ago |
|
|
eae62e02fc | 6 months ago |
|
|
d46932ae80 | 6 months ago |
|
|
a4da5fae63 | 6 months ago |
|
|
83146853ac | 6 months ago |
|
|
d094c2e008 | 6 months ago |
|
|
f16718c776 | 6 months ago |
|
|
a368ac1000 | 6 months ago |
|
|
8e8d41b24c | 6 months ago |
|
|
d95edf3e29 | 6 months ago |
|
|
e1a4a3fca2 | 6 months ago |
|
|
8d5c70a2ac | 6 months ago |
|
|
3b8ef1c3e7 | 6 months ago |
|
|
8b0c521458 | 6 months ago |
|
|
330694ea9a | 6 months ago |
|
|
8d04eb7797 | 6 months ago |
|
|
4939b4a1f7 | 6 months ago |
|
|
a0e4f0c889 | 6 months ago |
|
|
c34cd72c36 | 6 months ago |
|
|
2f50cc60ae | 6 months ago |
|
|
da3e8060c6 | 6 months ago |
|
|
f196b844ae | 6 months ago |
|
|
bca385f869 | 6 months ago |
|
|
013fc3d510 | 6 months ago |
|
|
a3cb2445c5 | 6 months ago |
|
|
ccf4b6d660 | 6 months ago |
|
|
5a816afe2b | 6 months ago |
|
|
c371216700 | 6 months ago |
|
|
8e5bfca828 | 6 months ago |
|
|
9647cd4237 | 6 months ago |
|
|
6812d39148 | 6 months ago |
|
|
6c27bb8945 | 6 months ago |
|
|
96f673b7f6 | 6 months ago |
|
|
3d3dde0db8 | 6 months ago |
|
|
e6f4dd03f7 | 6 months ago |
|
|
2083ce0778 | 6 months ago |
|
|
cee7492913 | 6 months ago |
|
|
501231471e | 6 months ago |
|
|
0b90cf50c0 | 6 months ago |
|
|
b54c72a7c1 | 6 months ago |
|
|
f01a4bb491 | 6 months ago |
|
|
c27a504a8a | 6 months ago |
|
|
b692aeb0e5 | 6 months ago |
|
|
15b796288b | 6 months ago |
|
|
ac42c6b21d | 6 months ago |
|
|
9c9caa848e | 6 months ago |
|
|
890f15faed | 6 months ago |
|
|
ce868a7a79 | 6 months ago |
|
|
250723f9a1 | 6 months ago |
|
|
5bbb027dbe | 6 months ago |
|
|
c2b7456269 | 6 months ago |
|
|
2916a306d9 | 6 months ago |
|
|
5bc81bbb9e | 6 months ago |
|
|
e7f41af249 | 6 months ago |
|
|
2a02ee689a | 6 months ago |
|
|
9fba121898 | 6 months ago |
|
|
b1543211a8 | 6 months ago |
|
|
6f72944222 | 6 months ago |
|
|
1b6760169a | 6 months ago |
|
|
498f143944 | 6 months ago |
|
|
1d7bb41776 | 6 months ago |
|
|
2bd6648236 | 6 months ago |
|
|
3f8d45ded1 | 6 months ago |
|
|
6714afa2ca | 6 months ago |
|
|
48f10bcf1b | 6 months ago |
|
|
acf4ad2917 | 6 months ago |
|
|
4e1d6d900c | 6 months ago |
|
|
c9f44f33aa | 6 months ago |
|
|
cc3fc57bc1 | 6 months ago |
|
|
777fd3d0cb | 6 months ago |
|
|
3f34bdd216 | 6 months ago |
|
|
b054dc00b5 | 6 months ago |
|
|
94e515c844 | 6 months ago |
|
|
cc0a88f88f | 6 months ago |
|
|
6d7d666f1e | 6 months ago |
|
|
1eda8a7810 | 6 months ago |
|
|
d12f1e9fee | 6 months ago |
|
|
afc1023922 | 6 months ago |
|
|
3c5c9feb6c | 6 months ago |
|
|
a9fb44a43d | 6 months ago |
|
|
ed8e4ba51f | 6 months ago |
|
|
feaa894000 | 6 months ago |
|
|
62f78c10cc | 6 months ago |
|
|
85e5f717ba | 6 months ago |
|
|
bec76bac22 | 6 months ago |
|
|
323d76df4c | 6 months ago |
|
|
45b8018f82 | 6 months ago |
|
|
38d2e31f9f | 6 months ago |
|
|
0dfdafcd16 | 6 months ago |
|
|
9272d5a9c7 | 6 months ago |
|
|
340b4d8d90 | 6 months ago |
|
|
fab2813d56 | 6 months ago |
|
|
d38a120a3d | 6 months ago |
|
|
56bf30f199 | 6 months ago |
|
|
7804c89f1e | 6 months ago |
|
|
383baac2e0 | 6 months ago |
|
|
884c94f63f | 6 months ago |
|
|
7b648ce634 | 6 months ago |
|
|
6f3fab1051 | 6 months ago |
|
|
64a7d18cf8 | 6 months ago |
|
|
7c5927db99 | 6 months ago |
|
|
0b8f24f993 | 6 months ago |
|
|
9addaad168 | 6 months ago |
|
|
f268725d8f | 6 months ago |
|
|
68a3915e4e | 6 months ago |
|
|
5927e924cb | 6 months ago |
|
|
7c26f6f012 | 6 months ago |
|
|
3bf76a5a6e | 7 months ago |
|
|
39d8e7ab0f | 7 months ago |
|
|
f4f84e0931 | 7 months ago |
|
|
d006860a38 | 7 months ago |
|
|
37f23c2549 | 7 months ago |
|
|
b69a785f41 | 7 months ago |
|
|
a06319d76c | 7 months ago |
|
|
c7d77e6e95 | 7 months ago |
|
|
296d6b5d32 | 7 months ago |
|
|
7f247a6400 | 7 months ago |
|
|
d9996538f9 | 7 months ago |
|
|
ec9e646105 | 7 months ago |
|
|
8e44612eb8 | 7 months ago |
|
|
0810e2319d | 7 months ago |
|
|
00e858f40f | 7 months ago |
|
|
7a2187aaa3 | 7 months ago |
|
|
d9b67ff2eb | 7 months ago |
|
|
cc2e0bbe78 | 7 months ago |
|
|
88253b59c3 | 7 months ago |
|
|
7b5078856c | 7 months ago |
|
|
555d0c81ba | 7 months ago |
|
|
03cd217189 | 7 months ago |
|
|
bef3b521ef | 7 months ago |
|
|
b99bd3f6de | 7 months ago |
|
|
157ad58ee3 | 7 months ago |
|
|
a95650f3e1 | 7 months ago |
|
|
d3c908ea07 | 7 months ago |
|
|
0e34f487a7 | 7 months ago |
|
|
211132d172 | 7 months ago |
|
|
45812787a0 | 7 months ago |
|
|
e39a103b79 | 7 months ago |
|
|
27ee8566cb | 7 months ago |
|
|
c060a51f76 | 7 months ago |
|
|
a964ad0816 | 7 months ago |
|
|
bee9da9c65 | 7 months ago |
|
|
457e418cd5 | 7 months ago |
|
|
bd1313f0ef | 7 months ago |
|
|
443466e740 | 7 months ago |
|
|
fc92106506 | 7 months ago |
|
|
2f42df90ad | 7 months ago |
|
|
ea34a10f26 | 7 months ago |
|
|
b6f847024f | 7 months ago |
|
|
b2844560ff | 7 months ago |
|
|
549750b739 | 7 months ago |
|
|
7222a07049 | 7 months ago |
|
|
faec4babe3 | 7 months ago |
|
|
7819ce076f | 7 months ago |
|
|
2a6af6062e | 7 months ago |
|
|
14b70d472d | 7 months ago |
|
|
a7ac3f52a4 | 7 months ago |
@ -1,3 +1,3 @@
|
||||
install.ps1.sha256sum text eol=lf
|
||||
|
||||
* text=auto eol=lf
|
||||
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
@ -1,48 +0,0 @@
|
||||
ARG CUDA_VERSION=12.4.1
|
||||
ARG CUDA_TAG_SUFFIX=-cudnn-runtime-ubuntu22.04
|
||||
|
||||
FROM nvidia/cuda:${CUDA_VERSION}${CUDA_TAG_SUFFIX}
|
||||
|
||||
ARG TARGETPLATFORM
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
RUN apt-get update && apt-get install -y \
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
tzdata \
|
||||
iproute2 \
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
tini \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY . /workspace/gpustack
|
||||
RUN cd /workspace/gpustack && \
|
||||
make build
|
||||
|
||||
ARG VLLM_VERSION=0.8.5.post1
|
||||
RUN <<EOF
|
||||
if [ "$TARGETPLATFORM" = "linux/amd64" ]; then
|
||||
# Install vllm dependencies for x86_64
|
||||
if [ "$(echo "${CUDA_VERSION}" | cut -d. -f1,2)" = "11.8" ]; then
|
||||
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp38-abi3-manylinux1_x86_64.whl \
|
||||
--extra-index-url https://download.pytorch.org/whl/cu118;
|
||||
fi;
|
||||
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[all]";
|
||||
else
|
||||
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]";
|
||||
fi
|
||||
pip install pipx
|
||||
pip install $WHEEL_PACKAGE
|
||||
pip cache purge
|
||||
rm -rf /workspace/gpustack
|
||||
EOF
|
||||
|
||||
RUN gpustack download-tools
|
||||
|
||||
# Download dac weights used by audio models like Dia
|
||||
RUN python3 -m dac download
|
||||
|
||||
ENTRYPOINT [ "tini", "--", "gpustack", "start" ]
|
||||
@ -1,32 +0,0 @@
|
||||
FROM ubuntu:22.04
|
||||
|
||||
ARG TARGETPLATFORM
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
RUN apt-get update && apt-get install -y \
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
tzdata \
|
||||
iproute2 \
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
tini \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY . /workspace/gpustack
|
||||
RUN cd /workspace/gpustack && \
|
||||
make build && \
|
||||
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)[audio]" && \
|
||||
pip install pipx && \
|
||||
pip install $WHEEL_PACKAGE && \
|
||||
pip cache purge && \
|
||||
rm -rf /workspace/gpustack
|
||||
|
||||
RUN gpustack download-tools
|
||||
|
||||
# Download dac weights used by audio models like Dia
|
||||
RUN python3 -m dac download
|
||||
|
||||
ENTRYPOINT [ "tini", "--", "gpustack", "start" ]
|
||||
@ -1,454 +0,0 @@
|
||||
# Packaging logic:
|
||||
# 1. base target:
|
||||
# - Install tools, including Python, GCC, CMake, Make, SCCache and dependencies.
|
||||
# - Install specific version Ascend CANN according to the chip, including Toolkit and Kernels.
|
||||
# 2. mindie-install target:
|
||||
# - Install specific version Ascend CANN NNAL.
|
||||
# - Copy and intsall the ATB models from a fixed image.
|
||||
# - Install required dependencies.
|
||||
# - Install specific version MindIE.
|
||||
# 3. gpustack target (final):
|
||||
# - Install GPUStack, and override the required dependencies after installed.
|
||||
# - Set up the environment for CANN, NNAL and ATB models.
|
||||
# - Set up the entrypoint to start GPUStack.
|
||||
|
||||
# Arguments description:
|
||||
# - CANN_VERSION is the version of Ascend CANN,
|
||||
# which is used to install the Ascend CANN Toolkit, Kernels and NNAL.
|
||||
# - CANN_CHIP is the chip version of Ascend CANN,
|
||||
# which is used to install the Ascend CANN Kernels.
|
||||
# - MINDIE_VERSION is the version of Ascend MindIE,
|
||||
# which is used to install the Ascend MindIE,
|
||||
# please check https://www.hiascend.com/developer/download/community/result?module=ie%2Bpt%2Bcann for details.
|
||||
# - PYTHON_VERSION is the version of Python,
|
||||
# which should be properly set, it must be 3.x.
|
||||
|
||||
ARG CANN_VERSION=8.1.rc1.beta1
|
||||
ARG CANN_CHIP=910b
|
||||
ARG MINDIE_VERSION=2.0.rc1
|
||||
ARG PYTHON_VERSION=3.11
|
||||
|
||||
#
|
||||
# Stage Base
|
||||
#
|
||||
# Example build command:
|
||||
# docker build --tag=gpustack/gpustack:npu-base --file=Dockerfile.npu --target base --progress=plain .
|
||||
#
|
||||
|
||||
FROM ubuntu:20.04 AS base
|
||||
SHELL ["/bin/bash", "-eo", "pipefail", "-c"]
|
||||
|
||||
ARG TARGETPLATFORM
|
||||
ARG TARGETOS
|
||||
ARG TARGETARCH
|
||||
|
||||
## Install tools
|
||||
|
||||
ARG PYTHON_VERSION
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive \
|
||||
PYTHON_VERSION=${PYTHON_VERSION}
|
||||
|
||||
RUN <<EOF
|
||||
# Refresh
|
||||
apt-get update -y && apt-get install -y --no-install-recommends \
|
||||
software-properties-common apt-transport-https \
|
||||
&& add-apt-repository -y ppa:ubuntu-toolchain-r/test \
|
||||
&& add-apt-repository -y ppa:deadsnakes/ppa \
|
||||
&& apt-get update -y
|
||||
|
||||
# Install
|
||||
apt-get install -y --no-install-recommends \
|
||||
ca-certificates build-essential binutils bash openssl \
|
||||
curl wget aria2 \
|
||||
git git-lfs \
|
||||
unzip xz-utils \
|
||||
tzdata locales \
|
||||
iproute2 iputils-ping ifstat net-tools dnsutils pciutils ipmitool \
|
||||
procps sysstat htop \
|
||||
tini vim jq bc tree
|
||||
|
||||
# Update python
|
||||
PYTHON="python${PYTHON_VERSION}"
|
||||
apt-get install -y --no-install-recommends \
|
||||
${PYTHON} ${PYTHON}-dev ${PYTHON}-distutils ${PYTHON}-venv ${PYTHON}-lib2to3
|
||||
if [ -f /etc/alternatives/python ]; then update-alternatives --remove-all python; fi; update-alternatives --install /usr/bin/python python /usr/bin/${PYTHON} 10
|
||||
if [ -f /etc/alternatives/python3 ]; then update-alternatives --remove-all python3; fi; update-alternatives --install /usr/bin/python3 python3 /usr/bin/${PYTHON} 10
|
||||
curl -sS https://bootstrap.pypa.io/get-pip.py | ${PYTHON}
|
||||
|
||||
# Update locale
|
||||
localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
|
||||
|
||||
# Cleanup
|
||||
rm -rf /var/tmp/* \
|
||||
&& rm -rf /tmp/* \
|
||||
&& rm -rf /var/cache/apt \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
ENV LANG='en_US.UTF-8' \
|
||||
LANGUAGE='en_US:en' \
|
||||
LC_ALL='en_US.UTF-8'
|
||||
|
||||
## Install GCC
|
||||
|
||||
RUN <<EOF
|
||||
# GCC
|
||||
|
||||
# Install
|
||||
apt-get install -y --no-install-recommends \
|
||||
gcc-11 g++-11 gfortran-11 gfortran
|
||||
|
||||
# Update alternatives
|
||||
if [ -f /etc/alternatives/gcov-dump ]; then update-alternatives --remove-all gcov-dump; fi; update-alternatives --install /usr/bin/gcov-dump gcov-dump /usr/bin/gcov-dump-11 10
|
||||
if [ -f /etc/alternatives/lto-dump ]; then update-alternatives --remove-all lto-dump; fi; update-alternatives --install /usr/bin/lto-dump lto-dump /usr/bin/lto-dump-11 10
|
||||
if [ -f /etc/alternatives/gcov ]; then update-alternatives --remove-all gcov; fi; update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-11 10
|
||||
if [ -f /etc/alternatives/gcc ]; then update-alternatives --remove-all gcc; fi; update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 10
|
||||
if [ -f /etc/alternatives/gcc-nm ]; then update-alternatives --remove-all gcc-nm; fi; update-alternatives --install /usr/bin/gcc-nm gcc-nm /usr/bin/gcc-nm-11 10
|
||||
if [ -f /etc/alternatives/cpp ]; then update-alternatives --remove-all cpp; fi; update-alternatives --install /usr/bin/cpp cpp /usr/bin/cpp-11 10
|
||||
if [ -f /etc/alternatives/g++ ]; then update-alternatives --remove-all g++; fi; update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 10
|
||||
if [ -f /etc/alternatives/gcc-ar ]; then update-alternatives --remove-all gcc-ar; fi; update-alternatives --install /usr/bin/gcc-ar gcc-ar /usr/bin/gcc-ar-11 10
|
||||
if [ -f /etc/alternatives/gcov-tool ]; then update-alternatives --remove-all gcov-tool; fi; update-alternatives --install /usr/bin/gcov-tool gcov-tool /usr/bin/gcov-tool-11 10
|
||||
if [ -f /etc/alternatives/gcc-ranlib ]; then update-alternatives --remove-all gcc-ranlib; fi; update-alternatives --install /usr/bin/gcc-ranlib gcc-ranlib /usr/bin/gcc-ranlib-11 10
|
||||
if [ -f /etc/alternatives/gfortran ]; then update-alternatives --remove-all gfortran; fi; update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-11 10
|
||||
|
||||
# Cleanup
|
||||
rm -rf /var/tmp/* \
|
||||
&& rm -rf /tmp/* \
|
||||
&& rm -rf /var/cache/apt
|
||||
EOF
|
||||
|
||||
## Install CMake/Make/SCCache
|
||||
|
||||
RUN <<EOF
|
||||
# CMake/Make/SCCache
|
||||
|
||||
# Install
|
||||
apt-get install -y --no-install-recommends \
|
||||
pkg-config make
|
||||
curl -sL "https://github.com/Kitware/CMake/releases/download/v3.22.1/cmake-3.22.1-linux-$(uname -m).tar.gz" | tar -zx -C /usr --strip-components 1
|
||||
curl -sL "https://github.com/mozilla/sccache/releases/download/v0.10.0/sccache-v0.10.0-$(uname -m)-unknown-linux-musl.tar.gz" | tar -zx -C /usr/bin --strip-components 1
|
||||
|
||||
# Cleanup
|
||||
rm -rf /var/tmp/* \
|
||||
&& rm -rf /tmp/* \
|
||||
&& rm -rf /var/cache/apt
|
||||
EOF
|
||||
|
||||
## Install Dependencies
|
||||
|
||||
RUN <<EOF
|
||||
# Dependencies
|
||||
|
||||
# Install
|
||||
apt-get install -y --no-install-recommends \
|
||||
zlib1g zlib1g-dev libbz2-dev liblzma-dev libffi-dev openssl libssl-dev libsqlite3-dev \
|
||||
libblas-dev liblapack-dev libopenblas-dev libblas3 liblapack3 gfortran libhdf5-dev \
|
||||
libxml2 libxslt1-dev libgl1-mesa-glx libgmpxx4ldbl
|
||||
|
||||
# Cleanup
|
||||
rm -rf /var/tmp/* \
|
||||
&& rm -rf /tmp/* \
|
||||
&& rm -rf /var/cache/apt
|
||||
EOF
|
||||
|
||||
ARG CANN_VERSION
|
||||
ARG CANN_CHIP
|
||||
|
||||
ENV CANN_VERSION=${CANN_VERSION} \
|
||||
CANN_CHIP=${CANN_CHIP} \
|
||||
CANN_HOME="/usr/local/Ascend"
|
||||
|
||||
## Install CANN Toolkit
|
||||
|
||||
RUN <<EOF
|
||||
# CANN Toolkit
|
||||
|
||||
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
|
||||
ARCH="$(uname -m)"
|
||||
DOWNLOAD_VERSION="$(echo ${CANN_VERSION%\.beta1} | tr '[:lower:]' '[:upper:]')"
|
||||
URL_PREFIX="https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%20${DOWNLOAD_VERSION}"
|
||||
URL_SUFFIX="response-content-type=application/octet-stream"
|
||||
|
||||
# Install dependencies
|
||||
python3 -m pip install --no-cache-dir --root-user-action ignore --upgrade pip
|
||||
pip install --no-cache-dir --root-user-action ignore \
|
||||
attrs cython numpy==1.26.4 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py
|
||||
|
||||
# Install toolkit
|
||||
TOOLKIT_FILE="Ascend-cann-toolkit_${DOWNLOAD_VERSION}_${OS}-${ARCH}.run"
|
||||
TOOLKIT_PATH="/tmp/${TOOLKIT_FILE}"
|
||||
TOOLKIT_URL="${URL_PREFIX}/${TOOLKIT_FILE}?${URL_SUFFIX}"
|
||||
curl -H 'Referer: https://www.hiascend.com/' --retry 3 --retry-connrefused -fL -o "${TOOLKIT_PATH}" "${TOOLKIT_URL}"
|
||||
chmod a+x "${TOOLKIT_PATH}"
|
||||
printf "Y\n" | "${TOOLKIT_PATH}" --install --install-for-all --install-path="${CANN_HOME}"
|
||||
|
||||
# Cleanup
|
||||
rm -f "${TOOLKIT_PATH}" \
|
||||
&& rm -rf /var/log/ascend \
|
||||
&& rm -rf /var/log/ascend_seclog \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
## Install CANN Kernels
|
||||
|
||||
RUN <<EOF
|
||||
# CANN Kernels
|
||||
|
||||
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
|
||||
ARCH="$(uname -m)"
|
||||
DOWNLOAD_VERSION="$(echo ${CANN_VERSION%\.beta1} | tr '[:lower:]' '[:upper:]')"
|
||||
URL_PREFIX="https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%20${DOWNLOAD_VERSION}"
|
||||
URL_SUFFIX="response-content-type=application/octet-stream"
|
||||
|
||||
# Prepare environment
|
||||
source ${CANN_HOME}/ascend-toolkit/set_env.sh
|
||||
|
||||
# Install kernels
|
||||
KERNELS_FILE="Ascend-cann-kernels-${CANN_CHIP}_${DOWNLOAD_VERSION}_${OS}-${ARCH}.run"
|
||||
if ! curl -H 'Referer: https://www.hiascend.com/' --retry 3 --retry-connrefused -fsSIL "${URL_PREFIX}/${KERNELS_FILE}?${URL_SUFFIX}" >/dev/null 2>&1; then
|
||||
# Fallback to generic kernels
|
||||
KERNELS_FILE="Ascend-cann-kernels-${CANN_CHIP}_${DOWNLOAD_VERSION}_${OS}.run"
|
||||
fi
|
||||
KERNELS_PATH="/tmp/${KERNELS_FILE}"
|
||||
KERNELS_URL="${URL_PREFIX}/${KERNELS_FILE}?${URL_SUFFIX}"
|
||||
curl -H 'Referer: https://www.hiascend.com/' --retry 3 --retry-connrefused -fL -o "${KERNELS_PATH}" "${KERNELS_URL}"
|
||||
chmod a+x "${KERNELS_PATH}"
|
||||
printf "Y\n" |"${KERNELS_PATH}" --install --install-for-all --install-path="${CANN_HOME}"
|
||||
|
||||
# Cleanup
|
||||
rm -f "${KERNELS_PATH}" \
|
||||
&& rm -rf /var/log/ascend \
|
||||
&& rm -rf /var/log/ascend_seclog \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
#
|
||||
# Stage MindIE Install
|
||||
#
|
||||
# Example build command:
|
||||
# docker build --tag=gpustack/gpustack:npu-mindie-install --file=Dockerfile.npu --target mindie-install --progress=plain .
|
||||
#
|
||||
|
||||
FROM base AS mindie-install
|
||||
|
||||
## Install NNAL
|
||||
|
||||
RUN <<EOF
|
||||
# CANN NNAL
|
||||
|
||||
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
|
||||
ARCH="$(uname -m)"
|
||||
DOWNLOAD_VERSION="$(echo ${CANN_VERSION%\.beta1} | tr '[:lower:]' '[:upper:]')"
|
||||
URL_PREFIX="https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%20${DOWNLOAD_VERSION}"
|
||||
URL_SUFFIX="response-content-type=application/octet-stream"
|
||||
|
||||
# Prepare environment
|
||||
source ${CANN_HOME}/ascend-toolkit/set_env.sh
|
||||
|
||||
# Install NNAL
|
||||
NNAL_FILE="Ascend-cann-nnal_${DOWNLOAD_VERSION}_${OS}-${ARCH}.run"
|
||||
NNAL_PATH="/tmp/${NNAL_FILE}"
|
||||
NNAL_URL="${URL_PREFIX}/${NNAL_FILE}?${URL_SUFFIX}"
|
||||
curl -H 'Referer: https://www.hiascend.com/' --retry 3 --retry-connrefused -fL -o "${NNAL_PATH}" "${NNAL_URL}"
|
||||
chmod a+x "${NNAL_PATH}"
|
||||
printf "Y\n" | "${NNAL_PATH}" --install --install-path="${CANN_HOME}"
|
||||
|
||||
# Cleanup
|
||||
rm -f "${NNAL_PATH}" \
|
||||
&& rm -rf /var/log/ascend_seclog \
|
||||
&& rm -rf /var/log/cann_atb_log \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
COPY --from=thxcode/mindie:2.0.T17-800I-A2-py311-openeuler24.03-lts --chown=root:root ${CANN_HOME}/atb-models ${CANN_HOME}/atb-models
|
||||
RUN <<EOF
|
||||
# ATB Models
|
||||
|
||||
# Install
|
||||
pip install --no-cache-dir --root-user-action ignore ${CANN_HOME}/atb-models/*.whl
|
||||
|
||||
# Cleanup
|
||||
rm -f "${NNAL_PATH}" \
|
||||
&& rm -rf /var/log/ascend_seclog \
|
||||
&& rm -rf /var/log/cann_atb_log \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
## Install MindIE
|
||||
|
||||
ARG MINDIE_VERSION
|
||||
|
||||
ENV MINDIE_VERSION=${MINDIE_VERSION}
|
||||
|
||||
RUN <<EOF
|
||||
# MindIE
|
||||
|
||||
OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
|
||||
ARCH="$(uname -m)"
|
||||
DOWNLOAD_VERSION="$(echo ${MINDIE_VERSION%\.beta1} | tr '[:lower:]' '[:upper:]')"
|
||||
URL_PREFIX="https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/MindIE/MindIE%20${DOWNLOAD_VERSION}"
|
||||
URL_SUFFIX="response-content-type=application/octet-stream"
|
||||
|
||||
# Prepare environment
|
||||
source ${CANN_HOME}/ascend-toolkit/set_env.sh
|
||||
source ${CANN_HOME}/nnal/atb/set_env.sh
|
||||
|
||||
# Install dependencies,
|
||||
# according to Ascend Extension Installation, have the mapping requirements for the CANN_VERSION,
|
||||
# please check https://www.hiascend.com/document/detail/zh/Pytorch/700/configandinstg/instg/insg_0004.html for details.
|
||||
if [ ${ARCH} == "x86_64" ]; then
|
||||
pip install --no-cache-dir --root-user-action ignore torch==2.1.0+cpu --index-url https://download.pytorch.org/whl/cpu
|
||||
else
|
||||
pip install --no-cache-dir --root-user-action ignore torch==2.1.0
|
||||
fi
|
||||
pip install --no-cache-dir --root-user-action ignore torch-npu==2.1.0.post12 torchvision==0.16.0
|
||||
cat <<EOT >/tmp/requirements.txt
|
||||
av==14.3.0
|
||||
absl-py==2.2.2
|
||||
attrs==24.3.0
|
||||
certifi==2024.8.30
|
||||
cloudpickle==3.0.0
|
||||
einops==0.8.1
|
||||
easydict==1.13
|
||||
frozenlist==1.6.0
|
||||
gevent==24.2.1
|
||||
geventhttpclient==2.3.1
|
||||
greenlet==3.2.1
|
||||
grpcio==1.71.0
|
||||
icetk==0.0.4
|
||||
idna==2.8
|
||||
jsonlines==4.0.0
|
||||
jsonschema==4.23.0
|
||||
jsonschema-specifications==2025.4.1
|
||||
Jinja2==3.1.6
|
||||
loguru==0.7.2
|
||||
matplotlib==3.9.2
|
||||
ml_dtypes==0.5.0
|
||||
multidict==6.4.3
|
||||
nltk==3.9.1
|
||||
numba==0.61.2
|
||||
numpy==1.26.4
|
||||
pandas==2.2.3
|
||||
pillow==11.2.1
|
||||
prettytable==3.11.0
|
||||
pyarrow==19.0.1
|
||||
pydantic==2.9.2
|
||||
pydantic_core==2.23.4
|
||||
python-rapidjson==1.20
|
||||
requests==2.32.3
|
||||
sacrebleu==2.4.3
|
||||
tornado==6.4.2
|
||||
transformers==4.46.3
|
||||
tiktoken==0.7.0
|
||||
typing_extensions==4.13.2
|
||||
tzdata==2024.2
|
||||
tqdm==4.67.1
|
||||
thefuzz==0.22.1
|
||||
urllib3==2.4.0
|
||||
zope.event==5.0
|
||||
zope.interface==7.0.3
|
||||
EOT
|
||||
pip install --no-cache-dir --root-user-action ignore -r /tmp/requirements.txt
|
||||
|
||||
# Install MindIE
|
||||
MINDIE_FILE="Ascend-mindie_${DOWNLOAD_VERSION}_${OS}-${ARCH}.run"
|
||||
MINDIE_PATH="/tmp/${MINDIE_FILE}"
|
||||
MINDIE_URL="${URL_PREFIX}/${MINDIE_FILE}?${URL_SUFFIX}"
|
||||
curl -H 'Referer: https://www.hiascend.com/' --retry 3 --retry-connrefused -fL -o "${MINDIE_PATH}" "${MINDIE_URL}"
|
||||
chmod a+x "${MINDIE_PATH}"
|
||||
printf "Y\n" | "${MINDIE_PATH}" --install --install-path="${CANN_HOME}"
|
||||
|
||||
# Post process
|
||||
chmod +w "${CANN_HOME}/mindie/latest/mindie-service/conf"
|
||||
|
||||
# Review
|
||||
pip freeze \
|
||||
&& python -m site
|
||||
|
||||
# Cleanup
|
||||
rm -f "${MINDIE_PATH}" \
|
||||
&& rm -rf /var/log/mindie_log \
|
||||
&& rm -rf ~/log \
|
||||
&& rm -rf /tmp/* \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
#
|
||||
# Stage GPUStack
|
||||
#
|
||||
# Example build command:
|
||||
# docker build --tag=gpustack/gpustack:npu --file=Dockerfile.npu --progress=plain .
|
||||
#
|
||||
|
||||
FROM mindie-install AS gpustack
|
||||
|
||||
## Install GPUStack
|
||||
|
||||
RUN --mount=type=bind,target=/workspace/gpustack,rw <<EOF
|
||||
# Build
|
||||
cd /workspace/gpustack \
|
||||
&& make build
|
||||
|
||||
# Install,
|
||||
# vox-box relies on PyTorch 2.7, which is not compatible with MindIE.
|
||||
WHEEL_PACKAGE="$(ls /workspace/gpustack/dist/*.whl)"
|
||||
pip install --no-cache-dir --root-user-action ignore $WHEEL_PACKAGE
|
||||
|
||||
# Download tools
|
||||
gpustack download-tools --device npu
|
||||
|
||||
# Post-process,
|
||||
# override the required dependencies after installed.
|
||||
cat <<EOT >/tmp/requirements.txt
|
||||
pipx==1.7.1
|
||||
EOT
|
||||
pip install --no-cache-dir --root-user-action ignore -r /tmp/requirements.txt
|
||||
|
||||
# Set up environment
|
||||
mkdir -p /var/lib/gpustack \
|
||||
&& chmod -R 0755 /var/lib/gpustack
|
||||
|
||||
# Review
|
||||
pip freeze \
|
||||
&& python -m site
|
||||
|
||||
# Cleanup
|
||||
rm -rf /workspace/gpustack/dist \
|
||||
&& rm -rf /tmp/* \
|
||||
&& pip cache purge
|
||||
EOF
|
||||
|
||||
## Setup environment
|
||||
|
||||
RUN <<EOF
|
||||
# Export CANN driver lib
|
||||
EXPORT_DRIVER_LIB="export LD_LIBRARY_PATH=${CANN_HOME}/driver/lib64/common:${CANN_HOME}/driver/lib64/driver:\${LD_LIBRARY_PATH}"
|
||||
echo "${EXPORT_DRIVER_LIB}" >> /etc/profile
|
||||
echo "${EXPORT_DRIVER_LIB}" >> ~/.bashrc
|
||||
|
||||
# Source CANN Toolkit environment
|
||||
SOURCE_TOOLKIT_ENV="source ${CANN_HOME}/ascend-toolkit/set_env.sh"
|
||||
echo "${SOURCE_TOOLKIT_ENV}" >> /etc/profile
|
||||
echo "${SOURCE_TOOLKIT_ENV}" >> ~/.bashrc
|
||||
|
||||
# Source CANN NNAL environment
|
||||
SOURCE_NNAL_ENV="source ${CANN_HOME}/nnal/atb/set_env.sh"
|
||||
echo "${SOURCE_NNAL_ENV}" >> /etc/profile
|
||||
echo "${SOURCE_NNAL_ENV}" >> ~/.bashrc
|
||||
|
||||
# Source ATB model environment
|
||||
SOURCE_ATB_MODEL_ENV="source ${CANN_HOME}/atb-models/set_env.sh"
|
||||
echo "${SOURCE_ATB_MODEL_ENV}" >> /etc/profile
|
||||
echo "${SOURCE_ATB_MODEL_ENV}" >> ~/.bashrc
|
||||
|
||||
# Export Driver Tools
|
||||
EXPORT_DRIVER_TOOLS="export PATH=${CANN_HOME}/driver/tools:\${PATH}"
|
||||
echo "${EXPORT_DRIVER_TOOLS}" >> /etc/profile
|
||||
echo "${EXPORT_DRIVER_TOOLS}" >> ~/.bashrc
|
||||
|
||||
# NB(thxCode): For specific MindIE version supporting,
|
||||
# we need to process environment setting up during GPUStack deployment.
|
||||
EOF
|
||||
|
||||
ENTRYPOINT [ "tini", "--", "/usr/bin/bash", "-c", "source /etc/profile && exec gpustack start \"$@\"", "--" ]
|
||||
@ -1,346 +0,0 @@
|
||||
import asyncio
|
||||
import time
|
||||
import httpx
|
||||
import numpy
|
||||
import logging
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
from openai import AsyncOpenAI
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
|
||||
# Avoid client side connection error: https://github.com/encode/httpx/discussions/3084
|
||||
http_client = httpx.AsyncClient(
|
||||
limits=httpx.Limits(
|
||||
max_connections=10000, max_keepalive_connections=10000, keepalive_expiry=30
|
||||
)
|
||||
)
|
||||
|
||||
SAMPLE_PROMPTS = [
|
||||
"Explain how blockchain technology works, and provide a real-world example of its application outside of cryptocurrency.",
|
||||
"Compare and contrast the philosophies of Nietzsche and Kant, including their views on morality and human nature.",
|
||||
"Imagine you're a travel blogger. Write a detailed post describing a week-long adventure through rural Japan.",
|
||||
"Write a fictional letter from Albert Einstein to a modern-day physicist, discussing the current state of quantum mechanics.",
|
||||
"Provide a comprehensive explanation of how transformers work in machine learning, including attention mechanisms and positional encoding.",
|
||||
"Draft a business proposal for launching a new AI-powered productivity app, including target audience, key features, and a monetization strategy.",
|
||||
"Simulate a panel discussion between Elon Musk, Marie Curie, and Sun Tzu on the topic of 'Leadership in Times of Crisis'.",
|
||||
"Describe the process of photosynthesis in depth, and explain its importance in the global carbon cycle.",
|
||||
"Analyze the impact of social media on political polarization, citing relevant studies or historical examples.",
|
||||
"Write a short science fiction story where humans discover a parallel universe that operates under different physical laws.",
|
||||
"Explain the role of the Federal Reserve in the U.S. economy and how it manages inflation and unemployment.",
|
||||
"Describe the architecture of a modern web application, from frontend to backend, including databases, APIs, and deployment.",
|
||||
"Write an essay discussing whether artificial general intelligence (AGI) poses an existential threat to humanity.",
|
||||
"Summarize the key events and consequences of the Cuban Missile Crisis, and reflect on lessons for modern diplomacy.",
|
||||
"Create a guide for beginners on how to train a custom LLM using open-source tools and publicly available datasets.",
|
||||
]
|
||||
|
||||
|
||||
async def process_stream(stream):
|
||||
first_token_time = None
|
||||
total_tokens = 0
|
||||
async for chunk in stream:
|
||||
if first_token_time is None:
|
||||
first_token_time = time.time()
|
||||
if chunk.choices[0].delta.content:
|
||||
total_tokens += 1
|
||||
if chunk.choices[0].finish_reason is not None:
|
||||
break
|
||||
return first_token_time, total_tokens
|
||||
|
||||
|
||||
async def make_request(
|
||||
client: AsyncOpenAI, model, max_completion_tokens, request_timeout
|
||||
):
|
||||
start_time = time.time()
|
||||
content = random.choice(SAMPLE_PROMPTS)
|
||||
|
||||
try:
|
||||
stream = await client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": content}],
|
||||
max_completion_tokens=max_completion_tokens,
|
||||
stream=True,
|
||||
)
|
||||
first_token_time, total_tokens = await asyncio.wait_for(
|
||||
process_stream(stream), timeout=request_timeout
|
||||
)
|
||||
|
||||
end_time = time.time()
|
||||
elapsed_time = end_time - start_time
|
||||
ttft = first_token_time - start_time if first_token_time else None
|
||||
tokens_per_second = total_tokens / elapsed_time if elapsed_time > 0 else 0
|
||||
return total_tokens, elapsed_time, tokens_per_second, ttft
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
logging.warning(f"Request timed out after {request_timeout} seconds")
|
||||
return None
|
||||
except Exception as e:
|
||||
logging.error(f"Error during request: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def worker(
|
||||
client,
|
||||
model,
|
||||
semaphore,
|
||||
queue,
|
||||
results,
|
||||
max_completion_tokens,
|
||||
request_timeout,
|
||||
):
|
||||
while True:
|
||||
async with semaphore:
|
||||
task_id = await queue.get()
|
||||
if task_id is None:
|
||||
queue.task_done()
|
||||
break
|
||||
logging.info(f"Starting request {task_id}")
|
||||
result = await make_request(
|
||||
client, model, max_completion_tokens, request_timeout
|
||||
)
|
||||
if result:
|
||||
results.append(result)
|
||||
else:
|
||||
logging.warning(f"Request {task_id} failed")
|
||||
queue.task_done()
|
||||
logging.info(f"Finished request {task_id}")
|
||||
|
||||
|
||||
def calculate_percentile(values, percentile, reverse=False):
|
||||
if not values:
|
||||
return None
|
||||
if reverse:
|
||||
return numpy.percentile(values, 100 - percentile)
|
||||
return numpy.percentile(values, percentile)
|
||||
|
||||
|
||||
async def preflight_check(client, model) -> bool:
|
||||
result = await make_request(client, model, 16, 60)
|
||||
return result is not None
|
||||
|
||||
|
||||
async def main(
|
||||
model,
|
||||
num_requests,
|
||||
concurrency,
|
||||
request_timeout,
|
||||
max_completion_tokens,
|
||||
server_url,
|
||||
api_key,
|
||||
):
|
||||
client = AsyncOpenAI(
|
||||
base_url=f"{server_url}/v1",
|
||||
api_key=api_key,
|
||||
http_client=http_client,
|
||||
max_retries=0,
|
||||
)
|
||||
|
||||
if not await preflight_check(client, model):
|
||||
logging.error(
|
||||
"Preflight check failed. Please check configuration and the service status."
|
||||
)
|
||||
return
|
||||
|
||||
semaphore = asyncio.Semaphore(concurrency)
|
||||
queue = asyncio.Queue()
|
||||
results = []
|
||||
|
||||
# Add tasks to the queue
|
||||
for i in range(num_requests):
|
||||
await queue.put(i)
|
||||
|
||||
# Add sentinel values to stop workers
|
||||
for _ in range(concurrency):
|
||||
await queue.put(None)
|
||||
|
||||
# Create worker tasks
|
||||
workers = [
|
||||
asyncio.create_task(
|
||||
worker(
|
||||
client,
|
||||
model,
|
||||
semaphore,
|
||||
queue,
|
||||
results,
|
||||
max_completion_tokens,
|
||||
request_timeout,
|
||||
)
|
||||
)
|
||||
for _ in range(concurrency)
|
||||
]
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
# Wait for all tasks to complete
|
||||
await queue.join()
|
||||
await asyncio.gather(*workers)
|
||||
|
||||
end_time = time.time()
|
||||
|
||||
# Calculate metrics
|
||||
total_elapsed_time = end_time - start_time
|
||||
total_tokens = sum(tokens for tokens, _, _, _ in results if tokens is not None)
|
||||
latencies = [
|
||||
elapsed_time for _, elapsed_time, _, _ in results if elapsed_time is not None
|
||||
]
|
||||
tokens_per_second_list = [tps for _, _, tps, _ in results if tps is not None]
|
||||
ttft_list = [ttft for _, _, _, ttft in results if ttft is not None]
|
||||
|
||||
successful_requests = len(results)
|
||||
success_rate = successful_requests / num_requests if num_requests > 0 else 0
|
||||
requests_per_second = (
|
||||
successful_requests / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
avg_latency = sum(latencies) / len(latencies) if latencies else 0
|
||||
avg_tokens_per_second = (
|
||||
sum(tokens_per_second_list) / len(tokens_per_second_list)
|
||||
if tokens_per_second_list
|
||||
else 0
|
||||
)
|
||||
overall_tokens_per_second = (
|
||||
total_tokens / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
avg_ttft = sum(ttft_list) / len(ttft_list) if ttft_list else 0
|
||||
|
||||
# Calculate percentiles
|
||||
percentiles = [50, 95, 99]
|
||||
latency_percentiles = [calculate_percentile(latencies, p) for p in percentiles]
|
||||
tps_percentiles = [
|
||||
calculate_percentile(tokens_per_second_list, p, reverse=True)
|
||||
for p in percentiles
|
||||
]
|
||||
ttft_percentiles = [calculate_percentile(ttft_list, p) for p in percentiles]
|
||||
|
||||
return {
|
||||
"model": model,
|
||||
"total_requests": num_requests,
|
||||
"successful_requests": successful_requests,
|
||||
"success_rate": success_rate,
|
||||
"concurrency": concurrency,
|
||||
"request_timeout": request_timeout,
|
||||
"max_completion_tokens": max_completion_tokens,
|
||||
"total_time": total_elapsed_time,
|
||||
"requests_per_second": requests_per_second,
|
||||
"total_completion_tokens": total_tokens,
|
||||
"latency": {
|
||||
"average": avg_latency,
|
||||
"p50": latency_percentiles[0],
|
||||
"p95": latency_percentiles[1],
|
||||
"p99": latency_percentiles[2],
|
||||
},
|
||||
"tokens_per_second": {
|
||||
"overall": overall_tokens_per_second,
|
||||
"average": avg_tokens_per_second,
|
||||
"p50": tps_percentiles[0],
|
||||
"p95": tps_percentiles[1],
|
||||
"p99": tps_percentiles[2],
|
||||
},
|
||||
"time_to_first_token": {
|
||||
"average": avg_ttft,
|
||||
"p50": ttft_percentiles[0],
|
||||
"p95": ttft_percentiles[1],
|
||||
"p99": ttft_percentiles[2],
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def output_results(results, result_file=None):
|
||||
# Round all floats in results to two decimal places for output
|
||||
def _round_floats(obj, ndigits=2):
|
||||
if isinstance(obj, dict):
|
||||
return {k: _round_floats(v, ndigits) for k, v in obj.items()}
|
||||
if isinstance(obj, list):
|
||||
return [_round_floats(v, ndigits) for v in obj]
|
||||
if isinstance(obj, float):
|
||||
return round(obj, ndigits)
|
||||
return obj
|
||||
|
||||
formatted_results = _round_floats(results, 2)
|
||||
if result_file:
|
||||
with open(result_file, "w") as f:
|
||||
json.dump(formatted_results, f, indent=2)
|
||||
logging.info(f"Results saved to {result_file}")
|
||||
else:
|
||||
print(json.dumps(formatted_results, indent=2))
|
||||
|
||||
|
||||
def set_http_client(args):
|
||||
if args.headers:
|
||||
for header in args.headers:
|
||||
if ":" not in header:
|
||||
parser.error(f"Invalid header format: {header}. Expected Key:Value")
|
||||
key, value = header.split(":", 1)
|
||||
http_client.headers[key.strip()] = value.strip()
|
||||
|
||||
http_client.timeout = args.request_timeout
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Benchmark Chat Completions API")
|
||||
parser.add_argument(
|
||||
"-m", "--model", type=str, required=True, help="Name of the model"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-n",
|
||||
"--num-requests",
|
||||
type=int,
|
||||
default=100,
|
||||
help="Number of requests to make (default: 100)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-c",
|
||||
"--concurrency",
|
||||
type=int,
|
||||
default=10,
|
||||
help="Number of concurrent requests (default: 10)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--request-timeout",
|
||||
type=int,
|
||||
default=300,
|
||||
help="Timeout for each request in seconds (default: 300)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-completion-tokens",
|
||||
type=int,
|
||||
default=1024,
|
||||
help="Maximum number of tokens in the completion (default: 1024)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--server-url",
|
||||
type=str,
|
||||
default="http://127.0.0.1",
|
||||
help="URL of the GPUStack server",
|
||||
)
|
||||
parser.add_argument("--api-key", type=str, default="fake", help="GPUStack API key")
|
||||
parser.add_argument(
|
||||
"--result-file",
|
||||
type=str,
|
||||
help="Result file path to save benchmark json results",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-H",
|
||||
"--header",
|
||||
action="append",
|
||||
dest="headers",
|
||||
help="Custom HTTP header in Key:Value format. May be specified multiple times.",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
set_http_client(args)
|
||||
|
||||
results = asyncio.run(
|
||||
main(
|
||||
args.model,
|
||||
args.num_requests,
|
||||
args.concurrency,
|
||||
args.request_timeout,
|
||||
args.max_completion_tokens,
|
||||
args.server_url,
|
||||
args.api_key,
|
||||
)
|
||||
)
|
||||
output_results(results, args.result_file)
|
||||
@ -0,0 +1,654 @@
|
||||
import asyncio
|
||||
from dataclasses import asdict, dataclass, is_dataclass
|
||||
import time
|
||||
from typing import List, Optional
|
||||
import aiohttp
|
||||
import numpy
|
||||
import logging
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
from openai import APIConnectionError, AsyncOpenAI
|
||||
from aiohttp import ClientSession
|
||||
from httpx_aiohttp import AiohttpTransport
|
||||
from openai import DefaultAsyncHttpxClient
|
||||
from openai.types.chat import (
|
||||
ChatCompletionStreamOptionsParam,
|
||||
)
|
||||
from tqdm import tqdm
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.WARNING, format="%(asctime)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
|
||||
SAMPLE_PROMPTS = [
|
||||
"Explain how blockchain technology works, and provide a real-world example of its application outside of cryptocurrency.",
|
||||
"Compare and contrast the philosophies of Nietzsche and Kant, including their views on morality and human nature.",
|
||||
"Imagine you're a travel blogger. Write a detailed post describing a week-long adventure through rural Japan.",
|
||||
"Write a fictional letter from Albert Einstein to a modern-day physicist, discussing the current state of quantum mechanics.",
|
||||
"Provide a comprehensive explanation of how transformers work in machine learning, including attention mechanisms and positional encoding.",
|
||||
"Draft a business proposal for launching a new AI-powered productivity app, including target audience, key features, and a monetization strategy.",
|
||||
"Simulate a panel discussion between Elon Musk, Marie Curie, and Sun Tzu on the topic of 'Leadership in Times of Crisis'.",
|
||||
"Describe the process of photosynthesis in depth, and explain its importance in the global carbon cycle.",
|
||||
"Analyze the impact of social media on political polarization, citing relevant studies or historical examples.",
|
||||
"Write a short science fiction story where humans discover a parallel universe that operates under different physical laws.",
|
||||
"Explain the role of the Federal Reserve in the U.S. economy and how it manages inflation and unemployment.",
|
||||
"Describe the architecture of a modern web application, from frontend to backend, including databases, APIs, and deployment.",
|
||||
"Write an essay discussing whether artificial general intelligence (AGI) poses an existential threat to humanity.",
|
||||
"Summarize the key events and consequences of the Cuban Missile Crisis, and reflect on lessons for modern diplomacy.",
|
||||
"Create a guide for beginners on how to train a custom LLM using open-source tools and publicly available datasets.",
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class PercentileResults:
|
||||
average: float
|
||||
p50: float
|
||||
p95: float
|
||||
p99: float
|
||||
|
||||
|
||||
@dataclass
|
||||
class BenchmarkResults:
|
||||
model: str
|
||||
total_requests: int
|
||||
successful_requests: int
|
||||
success_rate: float
|
||||
concurrency: int
|
||||
request_timeout: int
|
||||
max_completion_tokens: int
|
||||
total_time: float
|
||||
requests_per_second: float
|
||||
total_tokens: int
|
||||
total_prompt_tokens: int
|
||||
total_completion_tokens: int
|
||||
total_tokens_per_second: float
|
||||
total_prompt_tokens_per_second: float
|
||||
total_completion_tokens_per_second: float
|
||||
latency: PercentileResults
|
||||
completion_tokens_per_second: PercentileResults
|
||||
time_to_first_token: PercentileResults
|
||||
|
||||
|
||||
async def process_stream(stream):
|
||||
first_token_time = None
|
||||
async for chunk in stream:
|
||||
if first_token_time is None:
|
||||
first_token_time = time.time()
|
||||
if chunk.usage:
|
||||
return first_token_time, chunk.usage
|
||||
return first_token_time, None
|
||||
|
||||
|
||||
def get_random_prompt(prompt_multiplier):
|
||||
"""
|
||||
Returns a random prompt from the SAMPLE_PROMPTS list, repeated prompt_multiplier times.
|
||||
"""
|
||||
# Add a random prefix to avoid prefix cache hits
|
||||
random_prefix = str(random.randint(100000, 999999))
|
||||
return (
|
||||
random_prefix + " " + (random.choice(SAMPLE_PROMPTS) + " ") * prompt_multiplier
|
||||
)
|
||||
|
||||
|
||||
async def make_chat_completion_request(
|
||||
client: AsyncOpenAI,
|
||||
model,
|
||||
max_completion_tokens,
|
||||
ignore_eos,
|
||||
request_timeout,
|
||||
prompt_multiplier,
|
||||
):
|
||||
start_time = time.time()
|
||||
content = get_random_prompt(prompt_multiplier)
|
||||
try:
|
||||
stream = await client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": content}],
|
||||
max_completion_tokens=max_completion_tokens,
|
||||
stream=True,
|
||||
stream_options=ChatCompletionStreamOptionsParam(include_usage=True),
|
||||
extra_body={"ignore_eos": ignore_eos} if ignore_eos else None,
|
||||
)
|
||||
first_token_time, usage = await asyncio.wait_for(
|
||||
process_stream(stream), timeout=request_timeout
|
||||
)
|
||||
|
||||
end_time = time.time()
|
||||
elapsed_time = end_time - start_time
|
||||
ttft = (first_token_time - start_time) * 1000 if first_token_time else None
|
||||
return usage, elapsed_time, ttft
|
||||
except asyncio.TimeoutError:
|
||||
logging.warning(f"Request timed out after {request_timeout} seconds")
|
||||
return None
|
||||
except APIConnectionError as e:
|
||||
logging.error(f"API connection error: {str(e)}")
|
||||
return None
|
||||
except Exception as e:
|
||||
logging.error(f"Error during request: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def make_embedding_request(
|
||||
client: AsyncOpenAI,
|
||||
model,
|
||||
request_timeout,
|
||||
prompt_multiplier=1,
|
||||
):
|
||||
start_time = time.time()
|
||||
content = get_random_prompt(prompt_multiplier)
|
||||
try:
|
||||
response = await asyncio.wait_for(
|
||||
client.embeddings.create(model=model, input=content),
|
||||
timeout=request_timeout,
|
||||
)
|
||||
end_time = time.time()
|
||||
elapsed_time = end_time - start_time
|
||||
ttft = None # Embeddings do not have a time to first token in the same way as chat completions
|
||||
|
||||
return response.usage, elapsed_time, ttft
|
||||
except asyncio.TimeoutError:
|
||||
logging.warning(f"Embedding request timed out after {request_timeout} seconds")
|
||||
return None
|
||||
except Exception as e:
|
||||
logging.error(f"Error during embedding request: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def worker(
|
||||
client,
|
||||
model,
|
||||
semaphore,
|
||||
queue,
|
||||
results,
|
||||
max_completion_tokens,
|
||||
ignore_eos,
|
||||
request_timeout,
|
||||
embeddings=False,
|
||||
prompt_multiplier=1,
|
||||
pbar=None,
|
||||
):
|
||||
while True:
|
||||
async with semaphore:
|
||||
task_id = await queue.get()
|
||||
if task_id is None:
|
||||
queue.task_done()
|
||||
break
|
||||
logging.debug(f"Starting request {task_id}")
|
||||
if embeddings:
|
||||
result = await make_embedding_request(
|
||||
client, model, request_timeout, prompt_multiplier
|
||||
)
|
||||
else:
|
||||
result = await make_chat_completion_request(
|
||||
client,
|
||||
model,
|
||||
max_completion_tokens,
|
||||
ignore_eos,
|
||||
request_timeout,
|
||||
prompt_multiplier,
|
||||
)
|
||||
if result:
|
||||
results.append(result)
|
||||
else:
|
||||
logging.warning(f"Request {task_id} failed")
|
||||
queue.task_done()
|
||||
if pbar:
|
||||
pbar.update(1)
|
||||
logging.debug(f"Finished request {task_id}")
|
||||
|
||||
|
||||
def calculate_percentile(values, percentile, reverse=False):
|
||||
if not values:
|
||||
return None
|
||||
if reverse:
|
||||
return numpy.percentile(values, 100 - percentile)
|
||||
return numpy.percentile(values, percentile)
|
||||
|
||||
|
||||
async def preflight_check(client, model, embeddings=False) -> bool:
|
||||
if embeddings:
|
||||
result = await make_embedding_request(client, model, 16)
|
||||
else:
|
||||
result = await make_chat_completion_request(client, model, 16, False, 60, 1)
|
||||
return result is not None
|
||||
|
||||
|
||||
def set_headers(aiohttp_session: ClientSession, headers: Optional[List[str]]):
|
||||
if headers:
|
||||
for header in headers:
|
||||
if ":" not in header:
|
||||
raise ValueError(f"Invalid header format: {header}. Expected Key:Value")
|
||||
key, value = header.split(":", 1)
|
||||
aiohttp_session.headers[key.strip()] = value.strip()
|
||||
|
||||
|
||||
async def main(
|
||||
model,
|
||||
num_requests,
|
||||
concurrency,
|
||||
request_timeout,
|
||||
max_completion_tokens,
|
||||
ignore_eos,
|
||||
server_url,
|
||||
api_key,
|
||||
headers=None,
|
||||
embeddings=False,
|
||||
prompt_multiplier=1,
|
||||
) -> Optional[BenchmarkResults]:
|
||||
connector = aiohttp.TCPConnector(
|
||||
limit=2000,
|
||||
force_close=True,
|
||||
)
|
||||
async with ClientSession(connector=connector, trust_env=True) as aiohttp_session:
|
||||
if headers:
|
||||
set_headers(aiohttp_session, headers)
|
||||
transport = AiohttpTransport(client=aiohttp_session)
|
||||
httpx_client = DefaultAsyncHttpxClient(
|
||||
transport=transport, timeout=request_timeout
|
||||
)
|
||||
client = AsyncOpenAI(
|
||||
base_url=f"{server_url}/v1",
|
||||
api_key=api_key,
|
||||
http_client=httpx_client,
|
||||
max_retries=0,
|
||||
)
|
||||
|
||||
if not await preflight_check(client, model, embeddings=embeddings):
|
||||
raise Exception(
|
||||
"Preflight check failed. Please check configuration and the service status."
|
||||
)
|
||||
|
||||
semaphore = asyncio.Semaphore(concurrency)
|
||||
queue = asyncio.Queue()
|
||||
results = []
|
||||
|
||||
# Add tasks to the queue
|
||||
for i in range(num_requests):
|
||||
await queue.put(i)
|
||||
|
||||
# Add sentinel values to stop workers
|
||||
for _ in range(concurrency):
|
||||
await queue.put(None)
|
||||
|
||||
pbar = tqdm(
|
||||
total=num_requests,
|
||||
desc="Running Benchmark requests",
|
||||
unit="request",
|
||||
dynamic_ncols=True,
|
||||
)
|
||||
|
||||
# Create worker tasks
|
||||
workers = [
|
||||
asyncio.create_task(
|
||||
worker(
|
||||
client,
|
||||
model,
|
||||
semaphore,
|
||||
queue,
|
||||
results,
|
||||
max_completion_tokens,
|
||||
ignore_eos,
|
||||
request_timeout,
|
||||
embeddings,
|
||||
prompt_multiplier,
|
||||
pbar=pbar,
|
||||
)
|
||||
)
|
||||
for _ in range(concurrency)
|
||||
]
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
# Wait for all tasks to complete
|
||||
await queue.join()
|
||||
await asyncio.gather(*workers)
|
||||
|
||||
end_time = time.time()
|
||||
total_elapsed_time = end_time - start_time
|
||||
return calculate_results(
|
||||
model,
|
||||
concurrency,
|
||||
request_timeout,
|
||||
max_completion_tokens,
|
||||
total_elapsed_time,
|
||||
num_requests,
|
||||
results,
|
||||
)
|
||||
|
||||
|
||||
def calculate_results(
|
||||
model,
|
||||
concurrency,
|
||||
request_timeout,
|
||||
max_completion_tokens,
|
||||
total_elapsed_time,
|
||||
num_requests,
|
||||
results,
|
||||
):
|
||||
# Calculate metrics
|
||||
total_tokens = 0
|
||||
prompt_tokens = 0
|
||||
completion_tokens = 0
|
||||
tokens_per_second_list = []
|
||||
prompt_tokens_per_second_list = []
|
||||
completion_tokens_per_second_list = []
|
||||
for usage, elapsed_time, _ in results:
|
||||
if usage is not None:
|
||||
total_tokens += usage.total_tokens
|
||||
prompt_tokens += usage.prompt_tokens
|
||||
completion_tokens += usage.completion_tokens
|
||||
prompt_tokens_per_second = (
|
||||
usage.prompt_tokens / elapsed_time if elapsed_time > 0 else 0
|
||||
)
|
||||
completion_tokens_per_second = (
|
||||
usage.completion_tokens / elapsed_time if elapsed_time > 0 else 0
|
||||
)
|
||||
tokens_per_second = (
|
||||
usage.total_tokens / elapsed_time if elapsed_time > 0 else 0
|
||||
)
|
||||
tokens_per_second_list.append(tokens_per_second)
|
||||
prompt_tokens_per_second_list.append(prompt_tokens_per_second)
|
||||
completion_tokens_per_second_list.append(completion_tokens_per_second)
|
||||
|
||||
latencies = [
|
||||
elapsed_time for _, elapsed_time, _ in results if elapsed_time is not None
|
||||
]
|
||||
ttft_list = [ttft for _, _, ttft in results if ttft is not None]
|
||||
|
||||
successful_requests = len(results)
|
||||
success_rate = successful_requests / num_requests if num_requests > 0 else 0
|
||||
requests_per_second = (
|
||||
successful_requests / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
avg_latency = sum(latencies) / len(latencies) if latencies else 0
|
||||
avg_completion_tokens_per_second = (
|
||||
sum(completion_tokens_per_second_list) / len(completion_tokens_per_second_list)
|
||||
if completion_tokens_per_second_list
|
||||
else 0
|
||||
)
|
||||
total_tokens_per_second = (
|
||||
total_tokens / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
total_prompt_tokens_per_second = (
|
||||
prompt_tokens / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
total_completion_tokens_per_second = (
|
||||
completion_tokens / total_elapsed_time if total_elapsed_time > 0 else 0
|
||||
)
|
||||
avg_ttft = sum(ttft_list) / len(ttft_list) if ttft_list else 0
|
||||
|
||||
# Calculate percentiles
|
||||
percentiles = [50, 95, 99]
|
||||
latency_percentiles = [calculate_percentile(latencies, p) for p in percentiles]
|
||||
completion_tps_percentiles = [
|
||||
calculate_percentile(completion_tokens_per_second_list, p, reverse=True)
|
||||
for p in percentiles
|
||||
]
|
||||
ttft_percentiles = [calculate_percentile(ttft_list, p) for p in percentiles]
|
||||
|
||||
return BenchmarkResults(
|
||||
model=model,
|
||||
total_requests=num_requests,
|
||||
successful_requests=successful_requests,
|
||||
success_rate=success_rate,
|
||||
concurrency=concurrency,
|
||||
request_timeout=request_timeout,
|
||||
max_completion_tokens=max_completion_tokens,
|
||||
total_time=total_elapsed_time,
|
||||
requests_per_second=requests_per_second,
|
||||
total_tokens=total_tokens,
|
||||
total_prompt_tokens=prompt_tokens,
|
||||
total_completion_tokens=completion_tokens,
|
||||
total_tokens_per_second=total_tokens_per_second,
|
||||
total_prompt_tokens_per_second=total_prompt_tokens_per_second,
|
||||
total_completion_tokens_per_second=total_completion_tokens_per_second,
|
||||
latency=PercentileResults(
|
||||
average=avg_latency,
|
||||
p50=latency_percentiles[0],
|
||||
p95=latency_percentiles[1],
|
||||
p99=latency_percentiles[2],
|
||||
),
|
||||
completion_tokens_per_second=PercentileResults(
|
||||
average=avg_completion_tokens_per_second,
|
||||
p50=completion_tps_percentiles[0],
|
||||
p95=completion_tps_percentiles[1],
|
||||
p99=completion_tps_percentiles[2],
|
||||
),
|
||||
time_to_first_token=PercentileResults(
|
||||
average=avg_ttft,
|
||||
p50=ttft_percentiles[0],
|
||||
p95=ttft_percentiles[1],
|
||||
p99=ttft_percentiles[2],
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def fmt_line(label, *values, width=40):
|
||||
label_part = f"{label:<{width}}"
|
||||
value_part = " ".join(str(v) for v in values)
|
||||
return f"{label_part}{value_part}"
|
||||
|
||||
|
||||
def fmt_float(v, suffix=""):
|
||||
return f"{v:.2f}{suffix}"
|
||||
|
||||
|
||||
def output_benchmark_results_pretty(
|
||||
results: BenchmarkResults, file: str = None, embeddings: bool = False
|
||||
):
|
||||
|
||||
lines = []
|
||||
lines.append("============== Serving Benchmark Result ===============")
|
||||
lines.append(fmt_line("Model:", results.model))
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"Total requests:",
|
||||
f"{results.successful_requests}/{results.total_requests}({results.success_rate:.2%})",
|
||||
)
|
||||
)
|
||||
lines.append(fmt_line("Concurrency:", results.concurrency))
|
||||
lines.append(fmt_line("Benchmark duration (s):", fmt_float(results.total_time)))
|
||||
lines.append(
|
||||
fmt_line("Request throughput (req/s):", fmt_float(results.requests_per_second))
|
||||
)
|
||||
lines.append(fmt_line("Total input tokens:", results.total_prompt_tokens))
|
||||
if not embeddings:
|
||||
lines.append(fmt_line("Total output tokens:", results.total_completion_tokens))
|
||||
|
||||
output_tok_per_sec = (
|
||||
results.total_completion_tokens / results.total_time
|
||||
if results.total_time > 0
|
||||
else 0
|
||||
)
|
||||
total_tok_per_sec = (
|
||||
results.total_tokens / results.total_time if results.total_time > 0 else 0
|
||||
)
|
||||
if not embeddings:
|
||||
lines.append(
|
||||
fmt_line("Output token throughput (tok/s):", fmt_float(output_tok_per_sec))
|
||||
)
|
||||
lines.append(
|
||||
fmt_line("Total token throughput (tok/s):", fmt_float(total_tok_per_sec))
|
||||
)
|
||||
lines.append("------------------- Request Latency -------------------")
|
||||
lines.append(fmt_line("Average latency (s):", fmt_float(results.latency.average)))
|
||||
lines.append(fmt_line("P50 latency (s):", fmt_float(results.latency.p50)))
|
||||
lines.append(fmt_line("P95 latency (s):", fmt_float(results.latency.p95)))
|
||||
lines.append(fmt_line("P99 latency (s):", fmt_float(results.latency.p99)))
|
||||
if not embeddings:
|
||||
lines.append("--------------- Output Token Per Second ---------------")
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"Average TPS (tok/s):",
|
||||
fmt_float(results.completion_tokens_per_second.average),
|
||||
)
|
||||
)
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"P50 TPS (tok/s):", fmt_float(results.completion_tokens_per_second.p50)
|
||||
)
|
||||
)
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"P95 TPS (tok/s):", fmt_float(results.completion_tokens_per_second.p95)
|
||||
)
|
||||
)
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"P99 TPS (tok/s):", fmt_float(results.completion_tokens_per_second.p99)
|
||||
)
|
||||
)
|
||||
|
||||
lines.append("----------------- Time to First Token -----------------")
|
||||
lines.append(
|
||||
fmt_line(
|
||||
"Average TTFT (ms):", fmt_float(results.time_to_first_token.average)
|
||||
)
|
||||
)
|
||||
lines.append(
|
||||
fmt_line("P50 TTFT (ms):", fmt_float(results.time_to_first_token.p50))
|
||||
)
|
||||
lines.append(
|
||||
fmt_line("P95 TTFT (ms):", fmt_float(results.time_to_first_token.p95))
|
||||
)
|
||||
lines.append(
|
||||
fmt_line("P99 TTFT (ms):", fmt_float(results.time_to_first_token.p99))
|
||||
)
|
||||
lines.append("=" * 55)
|
||||
|
||||
output = "\n".join(lines)
|
||||
|
||||
if file:
|
||||
with open(file, "w") as f:
|
||||
f.write(output + "\n")
|
||||
logging.info(f"Pretty benchmark results saved to {file}")
|
||||
else:
|
||||
print(output)
|
||||
|
||||
|
||||
def output_benchmark_results_json(
|
||||
results: BenchmarkResults, result_file=None, embeddings: bool = False
|
||||
):
|
||||
# Round all floats in results to two decimal places for output
|
||||
def _round_floats(obj, ndigits=2):
|
||||
if is_dataclass(obj):
|
||||
obj = asdict(obj)
|
||||
if isinstance(obj, dict):
|
||||
return {k: _round_floats(v, ndigits) for k, v in obj.items()}
|
||||
if isinstance(obj, list):
|
||||
return [_round_floats(v, ndigits) for v in obj]
|
||||
if isinstance(obj, float):
|
||||
return round(obj, ndigits)
|
||||
return obj
|
||||
|
||||
formatted_results = _round_floats(results, 2)
|
||||
if result_file:
|
||||
with open(result_file, "w") as f:
|
||||
json.dump(formatted_results, f, indent=2)
|
||||
logging.info(f"Results saved to {result_file}")
|
||||
else:
|
||||
print(json.dumps(formatted_results, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Benchmark Chat Completions API")
|
||||
parser.add_argument(
|
||||
"-m", "--model", type=str, required=True, help="Name of the model"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-n",
|
||||
"--num-requests",
|
||||
type=int,
|
||||
default=100,
|
||||
help="Number of requests to make (default: 100)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-c",
|
||||
"--concurrency",
|
||||
type=int,
|
||||
default=10,
|
||||
help="Number of concurrent requests (default: 10)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--request-timeout",
|
||||
type=int,
|
||||
default=300,
|
||||
help="Timeout for each request in seconds (default: 300)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-completion-tokens",
|
||||
type=int,
|
||||
default=1024,
|
||||
help="Maximum number of tokens in the completion (default: 1024)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--prompt-multiplier",
|
||||
type=int,
|
||||
default=1,
|
||||
help="Repeat the randomly selected prompt N times to create longer inputs",
|
||||
)
|
||||
parser.add_argument(
|
||||
'--ignore-eos',
|
||||
action='store_true',
|
||||
help='Set ignore_eos flag when sending the benchmark request. This will not stop the stream when the model generates an EOS token.',
|
||||
)
|
||||
parser.add_argument(
|
||||
"--server-url",
|
||||
type=str,
|
||||
default="http://127.0.0.1",
|
||||
help="URL of the GPUStack server",
|
||||
)
|
||||
parser.add_argument("--api-key", type=str, default="fake", help="GPUStack API key")
|
||||
parser.add_argument(
|
||||
"--result-file",
|
||||
type=str,
|
||||
help="Result file path to save benchmark json results",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-H",
|
||||
"--header",
|
||||
action="append",
|
||||
dest="headers",
|
||||
help="Custom HTTP header in Key:Value format. May be specified multiple times.",
|
||||
)
|
||||
parser.add_argument(
|
||||
'--embeddings',
|
||||
action='store_true',
|
||||
help='Run embedding benchmark instead of chat completions',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--json',
|
||||
action='store_true',
|
||||
help='Output results in JSON format instead of pretty format',
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
results = asyncio.run(
|
||||
main(
|
||||
args.model,
|
||||
args.num_requests,
|
||||
args.concurrency,
|
||||
args.request_timeout,
|
||||
args.max_completion_tokens,
|
||||
args.ignore_eos,
|
||||
args.server_url,
|
||||
args.api_key,
|
||||
args.headers,
|
||||
args.embeddings,
|
||||
args.prompt_multiplier,
|
||||
)
|
||||
)
|
||||
if args.json:
|
||||
output_benchmark_results_json(
|
||||
results, args.result_file, embeddings=args.embeddings
|
||||
)
|
||||
else:
|
||||
output_benchmark_results_pretty(
|
||||
results, args.result_file, embeddings=args.embeddings
|
||||
)
|
||||
except Exception as e:
|
||||
logging.error(f"Benchmarking failed: {str(e)}")
|
||||
exit(1)
|
||||
@ -0,0 +1,26 @@
|
||||
aiohappyeyeballs==2.6.1
|
||||
aiohttp==3.12.13
|
||||
aiosignal==1.3.2
|
||||
annotated-types==0.7.0
|
||||
anyio==4.9.0
|
||||
attrs==25.3.0
|
||||
certifi==2025.6.15
|
||||
distro==1.9.0
|
||||
frozenlist==1.7.0
|
||||
h11==0.16.0
|
||||
httpcore==1.0.9
|
||||
httpx==0.28.1
|
||||
httpx-aiohttp==0.1.6
|
||||
idna==3.10
|
||||
jiter==0.10.0
|
||||
multidict==6.5.1
|
||||
numpy==2.3.1
|
||||
openai==1.92.2
|
||||
propcache==0.3.2
|
||||
pydantic==2.11.7
|
||||
pydantic_core==2.33.2
|
||||
sniffio==1.3.1
|
||||
tqdm==4.67.1
|
||||
typing-inspection==0.4.1
|
||||
typing_extensions==4.14.0
|
||||
yarl==1.20.1
|
||||
@ -0,0 +1,21 @@
|
||||
import shutil
|
||||
import tempfile
|
||||
import pytest
|
||||
from gpustack.config.config import Config, set_global_config
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def temp_dir():
|
||||
tmp_dir = tempfile.mkdtemp()
|
||||
print(f"Created temporary directory: {tmp_dir}")
|
||||
yield tmp_dir
|
||||
shutil.rmtree(tmp_dir)
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def config(temp_dir):
|
||||
cfg = Config(
|
||||
token="test", jwt_secret_key="test", data_dir=temp_dir, enable_ray=True
|
||||
)
|
||||
set_global_config(cfg)
|
||||
return cfg
|
||||
|
Before Width: | Height: | Size: 446 KiB After Width: | Height: | Size: 288 KiB |
|
After Width: | Height: | Size: 76 KiB |
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 30 KiB |
|
After Width: | Height: | Size: 41 KiB |
|
After Width: | Height: | Size: 32 KiB |
|
After Width: | Height: | Size: 42 KiB |
|
After Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 349 KiB After Width: | Height: | Size: 397 KiB |
|
Before Width: | Height: | Size: 506 KiB After Width: | Height: | Size: 471 KiB |
|
Before Width: | Height: | Size: 353 KiB After Width: | Height: | Size: 280 KiB |
|
Before Width: | Height: | Size: 310 KiB After Width: | Height: | Size: 213 KiB |
|
Before Width: | Height: | Size: 582 KiB After Width: | Height: | Size: 496 KiB |
|
Before Width: | Height: | Size: 248 KiB After Width: | Height: | Size: 240 KiB |
|
Before Width: | Height: | Size: 200 KiB After Width: | Height: | Size: 284 KiB |
|
Before Width: | Height: | Size: 148 KiB After Width: | Height: | Size: 173 KiB |
|
After Width: | Height: | Size: 240 KiB |
|
Before Width: | Height: | Size: 1.1 MiB After Width: | Height: | Size: 965 KiB |
|
Before Width: | Height: | Size: 910 KiB After Width: | Height: | Size: 862 KiB |
|
Before Width: | Height: | Size: 256 KiB After Width: | Height: | Size: 207 KiB |
|
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.5 MiB |
|
Before Width: | Height: | Size: 285 KiB After Width: | Height: | Size: 488 KiB |
|
Before Width: | Height: | Size: 278 KiB After Width: | Height: | Size: 216 KiB |
|
Before Width: | Height: | Size: 141 KiB After Width: | Height: | Size: 176 KiB |
|
Before Width: | Height: | Size: 160 KiB After Width: | Height: | Size: 181 KiB |
|
After Width: | Height: | Size: 186 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 162 KiB |
|
After Width: | Height: | Size: 471 KiB |
|
After Width: | Height: | Size: 87 KiB |
|
After Width: | Height: | Size: 305 KiB |
|
After Width: | Height: | Size: 18 KiB |
|
After Width: | Height: | Size: 37 KiB |
|
After Width: | Height: | Size: 147 KiB |
|
After Width: | Height: | Size: 175 KiB |
|
After Width: | Height: | Size: 168 KiB |
|
After Width: | Height: | Size: 174 KiB |
|
After Width: | Height: | Size: 46 KiB |
|
After Width: | Height: | Size: 151 KiB |
|
After Width: | Height: | Size: 152 KiB |
|
Before Width: | Height: | Size: 338 KiB After Width: | Height: | Size: 462 KiB |
|
Before Width: | Height: | Size: 193 KiB After Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 379 KiB After Width: | Height: | Size: 460 KiB |
|
Before Width: | Height: | Size: 735 KiB After Width: | Height: | Size: 666 KiB |
|
Before Width: | Height: | Size: 391 KiB After Width: | Height: | Size: 444 KiB |
|
Before Width: | Height: | Size: 166 KiB After Width: | Height: | Size: 158 KiB |
|
Before Width: | Height: | Size: 176 KiB After Width: | Height: | Size: 313 KiB |
|
Before Width: | Height: | Size: 237 KiB After Width: | Height: | Size: 247 KiB |
|
Before Width: | Height: | Size: 201 KiB After Width: | Height: | Size: 194 KiB |
|
After Width: | Height: | Size: 378 KiB |
|
After Width: | Height: | Size: 391 KiB |
|
After Width: | Height: | Size: 652 KiB |
|
After Width: | Height: | Size: 403 KiB |
|
After Width: | Height: | Size: 638 KiB |
|
After Width: | Height: | Size: 1.4 MiB |
|
After Width: | Height: | Size: 639 KiB |
|
Before Width: | Height: | Size: 330 KiB After Width: | Height: | Size: 234 KiB |
|
Before Width: | Height: | Size: 567 KiB After Width: | Height: | Size: 600 KiB |
|
Before Width: | Height: | Size: 316 KiB After Width: | Height: | Size: 326 KiB |
|
Before Width: | Height: | Size: 865 KiB After Width: | Height: | Size: 1.1 MiB |
|
Before Width: | Height: | Size: 951 KiB After Width: | Height: | Size: 1.1 MiB |
|
Before Width: | Height: | Size: 610 KiB After Width: | Height: | Size: 591 KiB |
|
Before Width: | Height: | Size: 420 KiB After Width: | Height: | Size: 326 KiB |
|
Before Width: | Height: | Size: 217 KiB After Width: | Height: | Size: 168 KiB |
|
After Width: | Height: | Size: 326 KiB |
|
Before Width: | Height: | Size: 564 KiB After Width: | Height: | Size: 256 KiB |
|
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 395 KiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.4 MiB |
|
Before Width: | Height: | Size: 1.3 MiB After Width: | Height: | Size: 1.5 MiB |
|
Before Width: | Height: | Size: 1.1 MiB After Width: | Height: | Size: 602 KiB |
|
Before Width: | Height: | Size: 332 KiB After Width: | Height: | Size: 190 KiB |
|
Before Width: | Height: | Size: 416 KiB After Width: | Height: | Size: 528 KiB |
|
Before Width: | Height: | Size: 436 KiB After Width: | Height: | Size: 490 KiB |
|
Before Width: | Height: | Size: 206 KiB After Width: | Height: | Size: 161 KiB |
|
Before Width: | Height: | Size: 210 KiB After Width: | Height: | Size: 166 KiB |
|
Before Width: | Height: | Size: 195 KiB After Width: | Height: | Size: 169 KiB |
|
Before Width: | Height: | Size: 205 KiB After Width: | Height: | Size: 159 KiB |
|
Before Width: | Height: | Size: 325 KiB After Width: | Height: | Size: 394 KiB |
|
Before Width: | Height: | Size: 195 KiB After Width: | Height: | Size: 166 KiB |
|
Before Width: | Height: | Size: 426 KiB After Width: | Height: | Size: 377 KiB |