Pronunciation correction is crucial for Text-to- Speech (TTS) systems in production. Traditional methods, which rely on phoneme sequence manipulation, are often cumbersome and error-prone. To address this, we propose Prompt-to-Correct, an editing-based methodology for pronunciation correction in TTS systems using voice prompts. Our approach enables ac- curate, granular corrections at test-time without the need for additional training or fine-tuning. Unlike existing speech edit- ing methods, we eliminate the need for external alignment to determine edit boundaries. By simply providing a correctly- pronounced reading of a word in any voice or accent, our system successfully corrects mispronunciations while maintaining continuity. Experimental results demonstrate that our method outperforms traditional baselines and state-of-the-art speech editing techniques.
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | ParrotTTS_std | FastSpeech2_alt | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |
reference | FastSpeech2_std | FastSpeech2_alt | ParrotTTS_std | ParrotTTS_alt | FluentSpeech_std | FluentSpeech_alt | Prompt-to-Correct(Ours) | |
---|---|---|---|---|---|---|---|---|
wav |