Prompt-to-Correct: Automated Test-Time Pronunciation Correction with Voice Prompts

Abstract:

Pronunciation correction is crucial for Text-to- Speech (TTS) systems in production. Traditional methods, which rely on phoneme sequence manipulation, are often cumbersome and error-prone. To address this, we propose Prompt-to-Correct, an editing-based methodology for pronunciation correction in TTS systems using voice prompts. Our approach enables ac- curate, granular corrections at test-time without the need for additional training or fine-tuning. Unlike existing speech edit- ing methods, we eliminate the need for external alignment to determine edit boundaries. By simply providing a correctly- pronounced reading of a word in any voice or accent, our system successfully corrects mispronunciations while maintaining continuity. Experimental results demonstrate that our method outperforms traditional baselines and state-of-the-art speech editing techniques.

Audio Samples

She had a certain je ne sais quoi to her personality

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

I am waiting for the latest Comme des Garçons collection

	reference	FastSpeech2_std	ParrotTTS_std	FastSpeech2_alt	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

Nietzsche explored existential themes in his writings

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

Schadenfreude is a common human emotion

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

Erythropoiesis is the process of red blood cell production in the bone marrow

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

Alkaptonuria is a rare genetic disorder effecting metabolism

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

His name is Raghu, Raghu is my friend

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

Kanyakumari is my birthplace

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

He lived near ayodhya

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav

My name is Shankar

	reference	FastSpeech2_std	FastSpeech2_alt	ParrotTTS_std	ParrotTTS_alt	FluentSpeech_std	FluentSpeech_alt	Prompt-to-Correct(Ours)
wav