What happened?
0
I am using Microsoft Cognitive Services Speech SDK (JavaScript) for text-to-speech synthesis. With the voice nl-BE-DenaNeural, I recently am observing context-dependent pronunciation errors where plosive consonants (/b/, /p/) are dropped in sentence-internal positions. The same words are pronounced correctly when spoken in isolation or sentence-final. This happens with plain text input (no SSML). Minimal reproducible examples. Incorrect pronunciation (sentence-internal): Vrijdag 6 februari. Dit is een afspraakje. Observed output: “februari” → “feruari” (missing /b/) “afspraakje” → “afsraakje” (missing /p/)
Using the same inputs with nl-BE-ArnaudNeural produces correct pronunciation in all cases.
Environment: Azure AI Speech – Text to Speech Voice: nl-BE-DenaNeural Locale: nl-BE SDK: JavaScript Speech SDK Input: plain text (no SSML)
Is this a known model regression or limitation in nl-BE-DenaNeural related to tokenization or prosodic context? Is there any official acknowledgement or workaround other than switching voices? Is this something that can only be fixed server-side by Microsoft? I am explicitly not looking for SSML hacks (breaks, phoneme overrides), as this is a production accessibility system where such workarounds are not scalable.
Version
1.36.0 (Latest)
What browser/platform are you seeing the problem on?
Chrome
Relevant log output
No runtime errors or warnings are produced by the SDK.
Speech synthesis requests complete successfully.
The issue is purely auditory and reproducible in the synthesized audio output:
plosive consonants (/b/, /p/) are dropped in sentence-internal context when using nl-BE-DenaNeural.
This is not accompanied by any client-side logs or exceptions and appears to be a server-side voice model issue. Audio samples can be provided if required.
What happened?
0
I am using Microsoft Cognitive Services Speech SDK (JavaScript) for text-to-speech synthesis. With the voice nl-BE-DenaNeural, I recently am observing context-dependent pronunciation errors where plosive consonants (/b/, /p/) are dropped in sentence-internal positions. The same words are pronounced correctly when spoken in isolation or sentence-final. This happens with plain text input (no SSML). Minimal reproducible examples. Incorrect pronunciation (sentence-internal): Vrijdag 6 februari. Dit is een afspraakje. Observed output: “februari” → “feruari” (missing /b/) “afspraakje” → “afsraakje” (missing /p/)
Using the same inputs with nl-BE-ArnaudNeural produces correct pronunciation in all cases.
Environment: Azure AI Speech – Text to Speech Voice: nl-BE-DenaNeural Locale: nl-BE SDK: JavaScript Speech SDK Input: plain text (no SSML)
Is this a known model regression or limitation in nl-BE-DenaNeural related to tokenization or prosodic context? Is there any official acknowledgement or workaround other than switching voices? Is this something that can only be fixed server-side by Microsoft? I am explicitly not looking for SSML hacks (breaks, phoneme overrides), as this is a production accessibility system where such workarounds are not scalable.
Version
1.36.0 (Latest)
What browser/platform are you seeing the problem on?
Chrome
Relevant log output