SpeechT5 - a microsoft Collection

microsoft 's Collections

SpeechT5

updated 1 day ago

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Paper • 2110.07205 • Published Oct 14, 2021 • 1
microsoft/speecht5_tts

Text-to-Speech • Updated 26 days ago • 40.3k • 251

Note Text-to-speech version of SpeechT5
Running ont4

181

👩‍🎤

SpeechT5 Speech Synthesis Demo
microsoft/speecht5_vc

Audio-to-Audio • Updated Mar 22 • 16.5k • 36

Note Voice-conversion version of SpeechT5
80

👩‍🎤

SpeechT5 Voice Conversion Demo
microsoft/speecht5_asr

Automatic Speech Recognition • Updated Mar 22 • 2.13k • 17

Note Automatic-speech-recognition version of SpeechT5
32

👩‍🎤

SpeechT5 Speech Recognition Demo
microsoft/speecht5_hifigan

Updated Feb 2 • 59.5k • 9

Note SpeechT5 produces a spectrogram, this model converts it to a waveform