Prompt-to-Correct: Automated Test-Time Pronunciation Correction with Voice Prompts

Abstract:

Pronunciation correction is crucial for Text-to- Speech (TTS) systems in production. Traditional methods, which rely on phoneme sequence manipulation, are often cumbersome and error-prone. To address this, we propose Prompt-to-Correct, an editing-based methodology for pronunciation correction in TTS systems using voice prompts. Our approach enables ac- curate, granular corrections at test-time without the need for additional training or fine-tuning. Unlike existing speech edit- ing methods, we eliminate the need for external alignment to determine edit boundaries. By simply providing a correctly- pronounced reading of a word in any voice or accent, our system successfully corrects mispronunciations while maintaining continuity. Experimental results demonstrate that our method outperforms traditional baselines and state-of-the-art speech editing techniques.

Image

Audio Samples

  • She had a certain je ne sais quoi to her personality
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • I am waiting for the latest Comme des Garçons collection
  • reference FastSpeech2_std ParrotTTS_std FastSpeech2_alt ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • Nietzsche explored existential themes in his writings
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • Schadenfreude is a common human emotion
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • Erythropoiesis is the process of red blood cell production in the bone marrow
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • Alkaptonuria is a rare genetic disorder effecting metabolism
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • His name is Raghu, Raghu is my friend
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • Kanyakumari is my birthplace
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • He lived near ayodhya
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav
  • My name is Shankar
  • reference FastSpeech2_std FastSpeech2_alt ParrotTTS_std ParrotTTS_alt FluentSpeech_std FluentSpeech_alt Prompt-to-Correct(Ours)
    wav