Bug Description
Using AudioFormatTag.OGG_OPUS (or any other compressed format: WEBM_OPUS, MP3, FLAC, etc.) with AudioInputStream.createPushStream() causes an immediate crash with AzureClientCancelled, reason=Error, code=RuntimeError before any audio is sent.
Root Cause
In AudioStreamFormatImpl constructor (src/sdk/Audio/AudioStreamFormat.ts), the switch statement only sets isWavFormat = true for PCM, ALaw, and MuLaw. Every other format — including OGG_OPUS — falls into default: isWavFormat = false.
The privHeader field is only assigned inside if (isWavFormat), so for OGG_OPUS it is never set and remains undefined at runtime despite the TypeScript type declaration protected privHeader: ArrayBuffer (non-optional) promising otherwise.
switch (format) {
case AudioFormatTag.PCM: this.formatTag = 1; break;
case AudioFormatTag.ALaw: this.formatTag = 6; break;
case AudioFormatTag.MuLaw: this.formatTag = 7; break;
default:
isWavFormat = false; // ← OGG_OPUS, WEBM_OPUS, MP3, FLAC, etc. all land here
}
// ...
if (isWavFormat) {
this.privHeader = new ArrayBuffer(44); // ← never reached for OGG_OPUS
}
// privHeader === undefined for all compressed formats
When startContinuousRecognitionAsync() fires, ServiceRecognizerBase.sendWaveHeader() reads format.header (which returns privHeader = undefined) and tries to build a SpeechConnectionMessage with it. Internally that message reads payload.byteLength:
TypeError: Cannot read properties of undefined (reading 'byteLength')
→ promise rejects
→ SDK fires canceled event: AzureClientCancelled, reason=Error, code=RuntimeError
Reproduction
import * as sdk from 'microsoft-cognitiveservices-speech-sdk';
import { AudioFormatTag, AudioStreamFormatImpl } from 'microsoft-cognitiveservices-speech-sdk/distrib/lib/src/sdk/Audio/AudioStreamFormat';
const format = new AudioStreamFormatImpl(16000, 16, 1, AudioFormatTag.OGG_OPUS);
console.log(format.header); // undefined — should be ArrayBuffer
const pushStream = sdk.AudioInputStream.createPushStream(format);
const config = sdk.SpeechConfig.fromSubscription('KEY', 'REGION');
const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
const recognizer = new sdk.SpeechRecognizer(config, audioConfig);
recognizer.canceled = (r, e) => console.log('CANCELLED:', e.errorDetails);
// → "RuntimeError" crash before any audio sent
recognizer.startContinuousRecognitionAsync();
Expected Behaviour
AudioFormatTag.OGG_OPUS is defined in the enum and the constructor accepts it as a parameter. It should not crash. For compressed formats where no WAV header is needed, privHeader should be set to an empty ArrayBuffer(0) so the SDK can safely send a zero-byte header message (which Azure ignores) and then stream the compressed audio data directly.
Proposed Fix
In the constructor, change:
if (isWavFormat) {
this.privHeader = new ArrayBuffer(44);
// ... build WAV header
}
To:
if (isWavFormat) {
this.privHeader = new ArrayBuffer(44);
// ... build WAV header
} else {
// Compressed formats (OGG_OPUS, WEBM_OPUS, MP3, FLAC, etc.) don't send a WAV header.
// Set to empty ArrayBuffer so sendWaveHeader() doesn't crash on payload.byteLength.
this.privHeader = new ArrayBuffer(0);
}
This is a one-line fix. Alternatively the privHeader field declaration could be changed to protected privHeader: ArrayBuffer = new ArrayBuffer(0) to set a safe default.
Verified Workaround
We have verified that setting privHeader to an empty ArrayBuffer(0) after construction fixes the crash and Azure successfully receives and decodes OGG/Opus audio:
const format = new AudioStreamFormatImpl(48000, 16, 2, AudioFormatTag.OGG_OPUS);
(format as any).privHeader = new ArrayBuffer(0); // workaround for SDK bug
const pushStream = sdk.AudioInputStream.createPushStream(format);
// Works — Azure GStreamer auto-detects OGG container from stream data
Environment
- SDK version: 1.48.0 (also present in latest master)
- Runtime: Node.js
- Platform: macOS / Linux
Bug Description
Using
AudioFormatTag.OGG_OPUS(or any other compressed format:WEBM_OPUS,MP3,FLAC, etc.) withAudioInputStream.createPushStream()causes an immediate crash withAzureClientCancelled, reason=Error, code=RuntimeErrorbefore any audio is sent.Root Cause
In
AudioStreamFormatImplconstructor (src/sdk/Audio/AudioStreamFormat.ts), the switch statement only setsisWavFormat = trueforPCM,ALaw, andMuLaw. Every other format — includingOGG_OPUS— falls intodefault: isWavFormat = false.The
privHeaderfield is only assigned insideif (isWavFormat), so forOGG_OPUSit is never set and remainsundefinedat runtime despite the TypeScript type declarationprotected privHeader: ArrayBuffer(non-optional) promising otherwise.When
startContinuousRecognitionAsync()fires,ServiceRecognizerBase.sendWaveHeader()readsformat.header(which returnsprivHeader=undefined) and tries to build aSpeechConnectionMessagewith it. Internally that message readspayload.byteLength:Reproduction
Expected Behaviour
AudioFormatTag.OGG_OPUSis defined in the enum and the constructor accepts it as a parameter. It should not crash. For compressed formats where no WAV header is needed,privHeadershould be set to an emptyArrayBuffer(0)so the SDK can safely send a zero-byte header message (which Azure ignores) and then stream the compressed audio data directly.Proposed Fix
In the constructor, change:
To:
This is a one-line fix. Alternatively the
privHeaderfield declaration could be changed toprotected privHeader: ArrayBuffer = new ArrayBuffer(0)to set a safe default.Verified Workaround
We have verified that setting
privHeaderto an emptyArrayBuffer(0)after construction fixes the crash and Azure successfully receives and decodes OGG/Opus audio:Environment