Skip to content

Digital Human Generation

Word count
1470 words
Reading time
10 minutes

The video and audio of the digital human are highly synchronized to achieve natural and smooth lip-syncing.

Preview

Offline Streaming
gradio_app_dh gradio_app_dh_streaming

Example

Offline Mode

Video, audio driver: Support cross-language support, lip-syncing, watermarking, super-resolution, and loop playback.

Original 1080P Chinese English Japanese Korean
Original 480P Chinese English Japanese Korean

Streaming Mode

Camera, microphone driver: Supports cross-language communication, lip-syncing, real-time interruption, real-time insertion, and real-time switching.

gradio_app_dh_streaming

Extension

AI Digital Human Assisted Live Streaming provides real-time interaction through digital human live streaming messages, offering live stream settings, real-time message replies, voice template cloning, and other functions to help streamers easily create high-quality live stream content.

Live Streaming Settings Real-time Messaging Message Reply Voice Templates Real-time integration new
gradio_app_dh_liveio1 gradio_app_dh_liveio2 gradio_app_dh_liveio3 gradio_app_dh_liveio4 gradio_app_dh_liveio5

📅 Planned Support

💡 Preview Modes new

The streaming mode supports four preview methods, which can be selected according to actual usage scenarios:

Mode Description Video Source Limitation
obs Push directly to OBS Virtual Camera, no display window Video only
cv2 Local window preview (Default) Camera / Video
ffplay Window preview using ffplay (FFmpeg required) Camera / Video

flow

⚡ Performance Reference new

The following are 30 frames of real-time digital human inference log snippets for reference only. Actual performance may vary due to factors such as device configuration.

log
[DH] -> Enter audio path or (!path|!m|b|c|q): Audio input: D:\Projects\creator\creator-box\webapp\upload\video_product.wav [Interrupt]
[DH] 2026-05-11 14:17:39.189 | INFO  1364  interface_streaming_v2.py:579 - Feeding audio #0: source=AudioSource.FILE, path=D:\Projects\creator\creator-box\webapp\upload\video_product.wav, interrupt=True
[DH] 2026-05-11 14:17:39.189 | INFO  1364  interface_streaming_v2.py:594 - Audio task #0 set as priority (interrupt mode), pending queue untouched
[DH] -> Enter audio path or (!path|!m|b|c|q): 2026-05-11 14:17:39.365 | INFO  1332  interface_streaming_v2.py:635 - Processing audio task #0 (interrupt=True)
[DH] 2026-05-11 14:17:39.365 | INFO  1332  interface_streaming_v2.py:672 - Audio #0: Loading audio file: D:\Projects\creator\creator-box\webapp\upload\video_product.wav
[DH] 2026-05-11 14:17:40.782 | INFO  1332  interface_streaming_v2.py:678 - Audio #0: Loaded: samples=489291, duration=30.581s, sr=16000
[DH] 2026-05-11 14:17:40.782 | INFO  1332  interface_streaming_v2.py:687 - Audio #0: Precomputing audio features (fps=30.0) ...
[DH] 2026-05-11 14:17:41.012 | INFO  1332  features.py:44 - Wenet features cached: key=wenet_feat:e31898fd38810..., shape=(784, 256)
[DH] 2026-05-11 14:17:41.014 | INFO  1332  features.py:62 - Audio features precomputation completed: total_frames=917, feat_shape=(20, 256), audio_duration=30.581s, fps=30.0
[DH] 2026-05-11 14:17:41.014 | INFO  1332  interface_streaming_v2.py:696 - Audio #0: Audio feature precomputation completed: 917 frames
[DH] 2026-05-11 14:17:41.015 | INFO  1332  interface_streaming_v2.py:713 - Audio #0: Sync thread started
[DH] 2026-05-11 14:17:41.015 | INFO  1660  interface_streaming_v2.py:837 - Audio #0: sync_worker_file started: fps=30.0, total_features=917
[DH] 2026-05-11 14:17:41.018 | INFO  1332  interface_streaming_v2.py:722 - Audio #0: Audio playback thread started
[DH] 2026-05-11 14:17:42.029 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=29.74ms, av_gap=-75.33ms, display=0.15ms, skipped=0
[DH] 2026-05-11 14:17:43.053 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.69ms, av_gap=12.22ms, display=0.25ms, skipped=1
[DH] 2026-05-11 14:17:44.080 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.76ms, av_gap=7.11ms, display=0.03ms, skipped=1
[DH] 2026-05-11 14:17:45.101 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.94ms, av_gap=8.67ms, display=0.08ms, skipped=0
[DH] 2026-05-11 14:17:46.132 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.81ms, av_gap=12.22ms, display=0.13ms, skipped=1
[DH] 2026-05-11 14:17:47.156 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.02ms, av_gap=11.33ms, display=0.10ms, skipped=1
[DH] 2026-05-11 14:17:48.177 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.23ms, av_gap=7.33ms, display=0.03ms, skipped=1
[DH] 2026-05-11 14:17:49.197 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.00ms, av_gap=8.67ms, display=0.02ms, skipped=0
[DH] 2026-05-11 14:17:50.229 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.90ms, av_gap=10.22ms, display=0.28ms, skipped=1
[DH] 2026-05-11 14:17:51.251 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.60ms, av_gap=11.56ms, display=0.02ms, skipped=1
[DH] 2026-05-11 14:17:52.283 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.82ms, av_gap=9.11ms, display=0.57ms, skipped=1
[DH] 2026-05-11 14:17:53.306 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.36ms, av_gap=7.33ms, display=0.49ms, skipped=0
[DH] 2026-05-11 14:17:54.327 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.02ms, av_gap=10.89ms, display=0.52ms, skipped=1
[DH] 2026-05-11 14:17:55.353 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.79ms, av_gap=9.33ms, display=0.05ms, skipped=1
[DH] 2026-05-11 14:17:56.384 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.94ms, av_gap=9.33ms, display=0.49ms, skipped=1
[DH] 2026-05-11 14:17:57.409 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=22.16ms, av_gap=4.44ms, display=0.49ms, skipped=1
[DH] 2026-05-11 14:17:58.430 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.05ms, av_gap=14.00ms, display=0.62ms, skipped=0
[DH] 2026-05-11 14:17:59.464 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.89ms, av_gap=11.33ms, display=0.56ms, skipped=1
[DH] 2026-05-11 14:18:00.492 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.17ms, av_gap=6.89ms, display=0.03ms, skipped=1
[DH] 2026-05-11 14:18:01.517 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.91ms, av_gap=9.78ms, display=0.57ms, skipped=1
[DH] 2026-05-11 14:18:02.544 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.99ms, av_gap=7.11ms, display=0.53ms, skipped=1
[DH] 2026-05-11 14:18:03.576 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.38ms, av_gap=11.33ms, display=0.64ms, skipped=1
[DH] 2026-05-11 14:18:04.607 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.24ms, av_gap=7.78ms, display=0.72ms, skipped=1
[DH] 2026-05-11 14:18:05.634 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.55ms, av_gap=12.67ms, display=0.62ms, skipped=0
[DH] 2026-05-11 14:18:06.666 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.82ms, av_gap=10.00ms, display=0.57ms, skipped=1
[DH] 2026-05-11 14:18:07.695 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=19.97ms, av_gap=12.22ms, display=0.07ms, skipped=1
[DH] 2026-05-11 14:18:08.720 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.54ms, av_gap=10.00ms, display=0.45ms, skipped=1
[DH] 2026-05-11 14:18:09.752 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=21.15ms, av_gap=9.78ms, display=0.57ms, skipped=1
[DH] 2026-05-11 14:18:10.775 | INFO  2164  interface_streaming_v2.py:414 - [STATS] phase=steady, n=30, infer=20.40ms, av_gap=7.78ms, display=0.62ms, skipped=1
[DH] 2026-05-11 14:18:11.899 | INFO  7332  interface_streaming_v2.py:814 - Audio #0: Playback finished: played_sec=30.581, total_sec=30.581, interrupted=False
[DH] 2026-05-11 14:18:11.899 | INFO  1332  interface_streaming_v2.py:728 - Audio #0: Playback finished
[DH] [OK] Audio #0 playback completed
[DH] [COMPLETE] id=0 source=AudioSource.FILE path=D:\Projects\creator\creator-box\webapp\upload\video_product.wav interrupt=True

Metrics

  • phase indicates the current phase (startup phase or stable phase)
  • n indicates the number of frames processed per second (fixed at 30 frames)
  • infer indicates the inference time per frame (the smaller the better)
  • av_gap indicates the audio-video synchronization gap; positive values indicate the video frame time leads the audio (video "ahead"), negative values indicate the video lags the audio (video "behind")
  • skipped indicates the number of frames skipped (if inference time is too long to output on time, some frames will be skipped to maintain synchronization)

🎥 Video tutorial

Note

  • liveio requires preprocessing of audio and video for the director and does not currently support real-time text generation. However, the SocketIO API can be used for secondary development and extension.
  • When using real-time mode, it is recommended to disable unused components to reduce inference time and improve performance.
  • To use the Microsoft Edge browser, you need to enable the edge://settings/content/mediaAutoplay setting.
  • Do not use this feature to process illegal or non-compliant content, and it is prohibited to infringe on others' privacy, copyright and other legitimate rights and interests.
  • The content above is for demonstration purposes only. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.