Skip to content

GPT-SoVITS4

Word count
336 words
Reading time
3 minutes

GPT-SoVITS4 is a leading multilingual speech generation tool that supports multiple languages ​​and dialects, including Mandarin, English, and Japanese, offering high-quality audio, instant response, rich expressiveness, and easy deployment..

Main Interface

None

If you see the output below, it means the service has started successfully.

json
{ "code": 0, "message": "install success, only api call" }

Launch Methods

  • Launch via URL
http://127.0.0.1:8000/launcher?project=gpt_sovits
  • Launch via Command
bash
uv run cli.py install -n gpt_sovits -p 8015 --start
  • Native Launch (API Only)
bash
uv run api_v2_.py --bind_addr 0.0.0.0 --port 8015

Output Logs

log
Python 3.11.11
Using extensions/gpt_sovits/.venv/scripts/python.exe
Checked 179 packages in 6ms
All installed packages are compatible
Audited 42 packages in 14ms
...
extensions/gpt_sovits/.venv/scripts/python.exe api_v2_.py --bind_addr 0.0.0.0 --port 8015
2025-12-24 00:09:50.152 | INFO  cbinstaller.py - ✅ GPT-SoVITS started (PID: 11300)
2025-12-24 00:09:50.153 | WARNING cbinstaller.py - No UI, API calls only
...
--------------------------------------------- TTS Config ---------------------------------------------
device              : cpu
is_half             : False
version             : v2Pro
t2s_weights_path    : GPT_SoVITS/pretrained_models/s1v3.ckpt
vits_weights_path   : GPT_SoVITS/pretrained_models/v2Pro/s2Gv2Pro.pth
bert_base_path      : GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large
cnhuhbert_base_path : GPT_SoVITS/pretrained_models/chinese-hubert-base
----------------------------------------------------------------------------------------------------

Loading Text2Semantic weights from GPT_SoVITS/pretrained_models/s1v3.ckpt
Loading VITS weights from GPT_SoVITS/pretrained_models/v2Pro/s2Gv2Pro.pth (all keys matched successfully)
Loading BERT weights from GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large
Loading CNHuBERT weights from GPT_SoVITS/pretrained_models/chinese-hubert-base
INFO:     Started server process [12392]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8015 (Press CTRL+C to quit)

Configuration Parameters

webapp/extensions/gpt_sovits/data/config.json
json
{
  "request": {
    "top_k": 5,
    "top_p": 1.0,
    "temperature": 1.0,
    "text_split_method": "cut5",
    "batch_size": 1,
    "batch_threshold": 0.75,
    "split_bucket": true,
    "fragment_interval": 0.3,
    "media_type": "wav",
    "streaming_mode": false,
    "parallel_infer": true,
    "repetition_penalty": 1.35,
    "sample_steps": 32,
    "super_sampling": false,
    "overlap_length": 2,
    "min_chunk_length": 16
  },
  "models": {
    "test": {
      "gpt": "webapp/extensions/gpt_sovits/train/weights/GPT_weights_v2Pro/test-e4.ckpt",
      "sovits": "webapp/extensions/gpt_sovits/train/weights/SoVITS_weights_v2Pro/test_e4_s80.pth"
    }
  }
}

Model Training

See the 《Model Training》 section in the documentation.

Notes

  • request: inference request parameters, models: custom model configuration

  • In the dubbing module, you can fill in local or remote addresses to enable batch speech synthesis, for example:

    • Local access: http://127.0.0.1:8015
    • Remote access: https://xxx.gradio.livehttps://xxx.ngrok-free.apphttps://xxx.loca.lt
    • Your own public IP or domain