You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
70 lines
2.9 KiB
70 lines
2.9 KiB
# Using Audio Models
|
|
|
|
GPUStack supports running both Speech-to-Text and Text-to-Speech models. Speech-to-Text models convert audio inputs in various languages into written text, while Text-to-Speech models transform written text into natural and expressive speech.
|
|
|
|
In this guide, we will walk you through deploying and using Speech-to-Text and Text-to-Speech models in GPUStack.
|
|
|
|
## Prerequisites
|
|
|
|
Before you begin, ensure that you have the following:
|
|
|
|
- A Linux system with AMD64 architecture or macOS.
|
|
- Access to Hugging Face for downloading the model files.
|
|
- GPUStack is installed and running. If not, refer to the [Quickstart Guide](../quickstart.md).
|
|
|
|
## Running Speech-to-Text Model
|
|
|
|
### Step 1: Deploy Speech-to-Text Model
|
|
|
|
Follow these steps to deploy the model from Hugging Face:
|
|
|
|
1. Navigate to the `Deployments` page in the GPUStack UI.
|
|
2. Click the `Deploy Model` button.
|
|
3. In the dropdown, select `Hugging Face` as the source for your model.
|
|
4. Use the search bar in the top left to search for the model name `Systran/faster-whisper-medium`.
|
|
5. Leave everything as default and click the `Save` button to deploy the model.
|
|
|
|

|
|
|
|
After deployment, you can monitor the model deployment's status on the `Deployments` page.
|
|
|
|

|
|
|
|
### Step 2: Interact with Speech-to-Text Model
|
|
|
|
1. Navigate to the `Playground` > `Audio` page in the GPUStack UI.
|
|
2. Select the `Speech to Text` Tab.
|
|
3. Select the deployed model from the top-right dropdown.
|
|
4. Click the `Upload` button to upload audio file or click the `Microphone` button to record audio.
|
|
5. Click the `Generate Text Content` button to generate the text.
|
|
|
|

|
|
|
|
## Running Text-to-Speech Model
|
|
|
|
### Step 1: Deploy Text-to-Speech Model
|
|
|
|
Follow these steps to deploy the model from Hugging Face:
|
|
|
|
1. Navigate to the `Deployments` page in the GPUStack UI.
|
|
2. Click the `Deploy Model` button.
|
|
3. In the dropdown, select `Hugging Face` as the source for your model.
|
|
4. Use the search bar in the top left to search for the model name `gpustack/CosyVoice-300M`.
|
|
5. Leave everything as default and click the `Save` button to deploy the model.
|
|
|
|

|
|
|
|
After deployment, you can monitor the model deployment's status on the `Deployments` page.
|
|
|
|

|
|
|
|
### Step 2: Interact with Text to Speech Model
|
|
|
|
1. Navigate to the `Playground` > `Audio` page in the GPUStack UI.
|
|
2. Select the `Text to Speech` Tab.
|
|
3. Choose the deployed model from the dropdown menu in the top-right corner. Then, configure the voice and output audio format.
|
|
4. Input the text to generate.
|
|
5. Click the `Submit` button to generate the text.
|
|
|
|

|