Audio Protection & Error Recovery
Invox Medical's streaming transcription client implements enterprise-grade audio protection mechanisms to ensure zero data loss even during network failures, browser crashes, or service disruptions. This guide explains the technical architecture that makes audio transcription reliable and resilient.
Overview
The transcription client employs multiple layers of protection:
- In-Memory Send Queue with sequence-ordered retry
- Circuit Breaker Pattern to prevent cascading failures
- Intelligent Retry Logic with exponential backoff
- Dual Acknowledgment System for efficient confirmation
- Automatic Recovery from network interruptions
- Online/Offline Detection with automatic queue drain on reconnect
- Queue Management with flow control
- Graceful Session Teardown on page unload to prevent orphaned sessions
In-Memory Queue
Audio chunks are held in two RAM structures that together guarantee chunks are not silently dropped:
interface PendingChunk {
id: string; // Unique identifier
sessionId: string; // UUID of the current session
sequenceNumber: number; // Monotonically increasing — used for ordering and batch ACKs
bytesB64: string; // Base64-encoded PCM16 @ 16 kHz, mono
durationMs: number; // Duration of audio represented by this chunk
retryCount?: number; // Number of retry attempts so far (max: 3)
}
// Chunks waiting to be sent over the WebSocket
const sendQueue: PendingChunk[] = [];
// Chunks that have been sent but not yet acknowledged by the server
const inFlight = new Map<string, { chunk: PendingChunk; timestamp: number }>();
Why RAM is sufficient for typical disconnections:
- Most network outages last seconds, not minutes. The in-memory queue holds everything during that window.
- When connectivity returns,
sendQueueandinFlightare drained in sequence order before a new chunk is sent. - Audio continues to accumulate locally in the PCM accumulator even while the queue is draining.
end() or trigger a graceful teardown on beforeunload/visibilitychange (see Best Practices).Flow Control
To prevent unbounded memory growth and network congestion, you can track the total audio duration of in-flight chunks:
const MAX_IN_FLIGHT_DURATION_MS = 10_000; // 10 s of unacknowledged audio maximum
function canSendNow(chunkDurationMs: number): boolean {
return (
ws?.readyState === WebSocket.OPEN &&
isOnline &&
inFlightAudioDurationMs + chunkDurationMs <= MAX_IN_FLIGHT_DURATION_MS
);
}
Example: With 0.5 s chunks, up to 20 chunks (~10 s) can be in-flight simultaneously. The 21st chunk waits until an acknowledgment frees up budget.
Queue Processing
function processQueueIfReady(): void {
if (ws?.readyState !== WebSocket.OPEN || !isOnline || isPaused) return;
while (sendQueue.length > 0) {
const chunk = sendQueue[0];
if (!canSendNow(chunk.durationMs)) break; // Wait for ACKs
sendQueue.shift();
inFlight.set(chunk.id, { chunk, timestamp: Date.now() });
inFlightAudioDurationMs += chunk.durationMs;
ws.send(JSON.stringify({ message: chunk.bytesB64, messageType: 'AUDIO', sessionId: chunk.sessionId }));
}
}
Error Handling Strategies
Circuit Breaker Pattern
It´s recommended to implement a circuit breaker to prevent cascading failures during service outages.
Configuration
const circuitBreaker = new CircuitBreaker({
failureThreshold: 5, // Open after 5 consecutive failures
timeout: 60000, // Wait 60s before retry
name: "TranscriptionClient",
});
Circuit States
1. CLOSED (Normal Operation)
- All operations allowed
- Failures are counted
- Transitions to OPEN after 5 failures
2. OPEN (Service Degraded)
- Operations immediately rejected
- Transcription paused automatically
- User notified: "Service temporarily unavailable"
- After 60 seconds, transitions to HALF_OPEN
3. HALF_OPEN (Testing Recovery)
- Single test operation allowed
- Success → CLOSED (full recovery)
- Failure → OPEN (back to degraded)
Protected Operations
The circuit breaker wraps critical operations:
// WebSocket connection
await circuitBreaker.execute("connect", async () => {
// Connection logic
});
// Audio chunk transmission
await circuitBreaker.execute("sendChunk", async () => {
// Send logic
});
// Transcription retrieval
await circuitBreaker.execute("getTranscription", async () => {
// Fetch logic
});
Connection Error Handling
WebSocket Failures:
// Tracks failure patterns
interface ConnectionMetrics {
consecutiveFailures: number;
lastFailureTime: number;
networkLatency: number;
throughput: number;
}
// On each failure
this.connectionMetrics.consecutiveFailures++;
this.connectionMetrics.lastFailureTime = Date.now();
Poor Network Adaptation:
// Automatically adjusts heartbeat interval
private handlePoorNetworkConditions() {
// Doubles heartbeat interval (max: 30s)
this.heartbeatMs = Math.min(this.heartbeatMs * 2, 30000)
}
Recovery Processes
Session Recovery on Reconnection
When the WebSocket reconnects after a disruption, you can re-enqueue everything that was not yet confirmed by the server:
function recoverPendingChunks(): void {
// Move all in-flight chunks back to the front of the queue
for (const [chunkId, inFlightData] of inFlight) {
clearTimeout(acknowledgmentTimeouts.get(chunkId));
acknowledgmentTimeouts.delete(chunkId);
inFlightAudioDurationMs -= inFlightData.chunk.durationMs;
sendQueue.unshift(inFlightData.chunk); // Priority: front of queue
}
inFlight.clear();
// Re-sort by sequence number to preserve order
sendQueue.sort((a, b) => a.sequenceNumber - b.sequenceNumber);
// Resume draining
processQueueIfReady();
}
Recovery Triggers:
- After successful WebSocket reconnection (
onopen) - After network comes back online (
onlineevent) - After circuit breaker transitions from OPEN to CLOSED
Resending Pending Chunks
After network recovery, in-flight chunks are moved back to the send queue and retried:
function resendPendingChunks(): void {
let recoveredDuration = 0;
for (const [chunkId, inFlightData] of inFlight) {
clearTimeout(acknowledgmentTimeouts.get(chunkId));
acknowledgmentTimeouts.delete(chunkId);
recoveredDuration += inFlightData.chunk.durationMs;
sendQueue.unshift(inFlightData.chunk); // Priority: front
}
inFlight.clear();
inFlightAudioDurationMs = Math.max(0, inFlightAudioDurationMs - recoveredDuration);
processQueueIfReady();
}
Triggered by:
- Network reconnection (
onlineevent) - WebSocket state change to connected
- Recovery from disconnected/error state
Exponential Backoff Reconnection
Failed connections retry with increasing delays:
private async reconnectLoop() {
// Add jittered delay (prevents thundering herd)
await sleep(jitter(this.backoffMs))
// Exponential backoff: 500ms → 1s → 2s → 4s → ... → 30s
this.backoffMs = Math.min(maxBackoffMs, this.backoffMs * 2)
return this.connect()
}
// Jitter adds randomness (±20%) to prevent synchronized retries
function jitter(ms: number) {
const r = Math.random() * 0.4 + 0.8 // Range: 0.8-1.2
return Math.floor(ms * r)
}
Backoff Schedule:
- Initial: 500ms ± 20%
- Maximum: 30,000ms (30 seconds)
- Multiplier: 2x per attempt
- Jitter: ±20% randomization
Network Resilience Features
Online/Offline Detection
You can monitor browser network status in real-time:
// Setup network monitoring
window.addEventListener("online", () => {
this.isOnline = true;
this.handleNetworkReconnection(); // Immediate reconnection
});
window.addEventListener("offline", () => {
this.isOnline = false;
this.handleNetworkDisconnection(); // Pause operations
});
Benefits:
- Prevents sending when offline
- Automatic reconnection when online
- User feedback on network status
- Preserves audio during outages
WebSocket State Management
You can maintain detailed connection states:
enum ClientStates {
IDLE = "idle", // Not yet connected
CONNECTING = "connecting", // Establishing connection
CONNECTED = "connected", // Active connection
RECONNECTING = "reconnecting", // Attempting reconnection
DISCONNECTED = "disconnected", // Intentionally disconnected
ERROR = "error", // Error state
PAUSED = "PAUSED", // Circuit breaker open or manual pause
}
State Transitions:
IDLE → CONNECTING → CONNECTED
↓ ↓
ERROR ← → RECONNECTING
↓
DISCONNECTED
Connection Health Checks
Heartbeat Mechanism:
// Send periodic ping to detect silent failures
private startHeartbeat() {
this.heartbeatTimer = setInterval(() => {
this.sendJson({ type: 'ping', t: Date.now() })
}, this.heartbeatMs) // Default: 10,000ms (10 seconds)
}
Adaptive Heartbeat:
- Normal conditions: 10 seconds
- Poor network: Doubles to 20s, then 30s (max)
- Detects silent connection failures
- Prevents idle connection timeouts
Automatic Reconnection
You can reconnect on unexpected closures:
private async onCloseWebSocket(event?: CloseEvent) {
console.log('WebSocket closed:', event?.code, event?.reason)
this.stopHeartbeat()
// Skip reconnection if intentionally ended
if (this.isEndingTranscription) {
this.updateState(ClientStates.DISCONNECTED)
return
}
this.updateState(ClientStates.DISCONNECTED)
// Auto-reconnect unless in ERROR or PAUSED
if (this.state !== ClientStates.ERROR && !this.isPaused) {
await this.reconnectLoop()
}
}
Acknowledgment & Retry Logic
Dual Acknowledgment System
We support two acknowledgment methods for flexibility and efficiency:
1. Individual Chunk Acknowledgment
Each chunk receives explicit confirmation:
// Server response format
{
"id": "1732579200000_000042",
"chunkId": "1732579200000_000042"
}
// Client handling
private async handleChunkAcknowledgment(chunkId: string) {
this.clearAcknowledmentTimeout(chunkId)
const inFlightData = this.inFlight.get(chunkId)
if (inFlightData) {
// Free up in-flight budget
this.inFlightAudioDurationMs -= inFlightData.chunk.durationMs
// Remove from tracking
this.inFlight.delete(chunkId)
// Persist acknowledgment
await this.markChunkAsAcknowledged(chunkId)
// Try sending more chunks
await this.processQueueIfReady()
}
}
Benefits:
- Fine-grained confirmation
- Low-latency feedback
- Simple to implement
2. Batch Acknowledgment
Multiple chunks confirmed with single message:
// Server response format
{
"type": "AUDIO_CHUNK_ACK",
"stream_id": "session-uuid",
"ack_id": 42 // All chunks with sequenceNumber <= 42 confirmed
}
// Client handling
private async handleBatchAcknowledgment(streamId: string, ackId: number) {
const chunksToAck = []
// Find all chunks with sequenceNumber <= ackId
for (const [chunkId, inFlightData] of this.inFlight) {
if (inFlightData.chunk.sequenceNumber <= ackId) {
chunksToAck.push(inFlightData)
}
}
// Sort by sequence to maintain order
chunksToAck.sort((a, b) => a.sequenceNumber - b.sequenceNumber)
// Acknowledge all matched chunks
for (const data of chunksToAck) {
this.clearAcknowledmentTimeout(data.chunk.id)
this.inFlight.delete(data.chunk.id)
await this.markChunkAsAcknowledged(data.chunk.id)
this.inFlightAudioDurationMs -= data.chunk.durationMs
}
await this.processQueueIfReady()
}
Benefits:
- 90%+ reduction in ACK messages
- Handles high-throughput scenarios
- Prevents ACK storms
Timeout-Based Retry
Chunks without acknowledgment are automatically retried:
// Set 30-second timeout on send
private setAcknowledmentTimeout(chunkId: string) {
const timeoutId = setTimeout(() => {
this.handleChunkTimeout(chunkId)
}, CHUNK_TIMEOUT_MS) // 30,000ms
this.acknowledgmentTimeouts.set(chunkId, timeoutId)
}
// Handle timeout
private handleChunkTimeout(chunkId: string) {
const inFlightData = this.inFlight.get(chunkId)
if (inFlightData) {
this.inFlight.delete(chunkId)
this.retryChunk(inFlightData.chunk)
}
}
Timeout Duration: 30 seconds (conservative for high-latency networks)
Exponential Backoff Retry
Failed chunks retry with increasing delays:
private retryChunk(chunk: PendingChunk) {
chunk.retryCount = (chunk.retryCount || 0) + 1
if (chunk.retryCount <= MAX_RETRY_ATTEMPTS) { // MAX = 3
setTimeout(() => {
this.sendQueue.unshift(chunk) // Priority: front of queue
this.processQueueIfReady()
}, jitter(1000 * 2 ** (chunk.retryCount - 1)))
} else {
// Report failure after max attempts
this.handleChunkFailure(chunk)
}
}
Retry Schedule:
- 1st retry: ~1s (2⁰ × 1000ms ± jitter)
- 2nd retry: ~2s (2¹ × 1000ms ± jitter)
- 3rd retry: ~4s (2² × 1000ms ± jitter)
- After 3 failures: Report error via
onErrorcallback
Total Time: ~37 seconds from initial send to final failure
Queue Management
Queue Structure
// FIFO queue for chunks awaiting transmission
private sendQueue: PendingChunk[] = []
// Map of chunks sent but not yet acknowledged
private inFlight: Map<string, { chunk: PendingChunk; timestamp: number }> = new Map()
Flow Control
To prevent memory overflow and buffer bloat:
const MAX_IN_FLIGHT_DURATION_MS = 10000 // 10 seconds max unacknowledged
private canSendNow(chunkDurationMs: number) {
return (
this.ws?.readyState === WebSocket.OPEN &&
this.isOnline &&
this.inFlightAudioDurationMs + chunkDurationMs <= MAX_IN_FLIGHT_DURATION_MS
)
}
Mechanism:
- Tracks total duration of unacknowledged audio
- Blocks new sends when limit reached
- Waits for acknowledgments to free budget
- Prevents network congestion
Example:
- Max 10 seconds in-flight
- Chunks are ~600ms each
- Max ~16 chunks simultaneously
- 17th chunk waits until ACK received
Queue Processing
private async processQueueIfReady() {
// Guard: Check connection state
if (!this.ws || this.ws.readyState !== WebSocket.OPEN ||
!this.isOnline || this.isPaused) {
return
}
while (this.sendQueue.length > 0) {
const chunk = this.sendQueue[0]
// Respect flow control limits
if (!this.canSendNow(chunk.durationMs)) {
break // Wait for ACKs
}
this.sendQueue.shift()
try {
await this.sendChunkInternal(chunk, 'sendChunk')
} catch (error) {
if (error instanceof CircuitBreakerError) {
// Return to queue, stop processing
this.sendQueue.unshift(chunk)
break
}
// Other errors: retry with backoff
this.retryChunk(chunk)
break
}
}
}
Queue Discipline:
- FIFO (First In, First Out)
- Priority for retries (added to front via
unshift) - Respects flow control limits
- Breaks on error to prevent starvation
Error Recovery Scenarios
Scenario 1: Network Drop During Recording
Flow:
- Browser detects network loss (
offlineevent) isOnlineflag set tofalsecanSendNow()returnsfalse(blocks queue)- Audio continues recording into the PCM accumulator
- Accumulated chunks are flushed and held in
sendQueuein memory - In-flight chunks timeout → moved back to the front of
sendQueue - Network returns (
onlineevent) resendPendingChunks()re-queues in-flight chunksconnect()re-establishes WebSocketprocessQueueIfReady()drains the queue in sequence order
Result: ✅ Zero audio loss (network outages of typical duration)
Scenario 2: Browser Crash Mid-Session
Flow:
- Browser is force-killed or crashes — all RAM is lost
- Any audio still in
sendQueueorinFlightthat had not been acknowledged by the server is lost - The session on the server side remains open for the TTL window
Mitigation: Minimise exposure by calling end() (or at minimum flushing the buffer) when the page is about to unload:
// Flush remaining audio and send END_TRANSCRIPTION before the page closes
window.addEventListener('beforeunload', () => {
flushAudioBuffer(); // flush any sub-threshold PCM still accumulated
ws?.send(JSON.stringify({ messageType: 'END_TRANSCRIPTION', sessionId }));
});
// Handle tab hidden (mobile or background tabs)
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
flushAudioBuffer();
}
});
Result: ⚠️ Hard crashes may lose a small tail of unacknowledged audio. Graceful teardown via beforeunload prevents most data loss.
Scenario 3: Circuit Breaker Opens
Flow:
- Server returns 5 consecutive errors
- Circuit breaker transitions to OPEN
isPausedset totrue- User shown: "Service temporarily unavailable"
- All sends fail immediately with
CircuitBreakerError - Chunks remain in
sendQueuein memory - After 60s, circuit transitions to HALF_OPEN
- Single test operation attempted
- Success: Circuit closes, queue processed
- Failure: Circuit reopens, wait another 60s
Result: ✅ Graceful degradation, no data loss
Scenario 4: Chunk Timeout (No ACK)
Flow:
- Chunk sent at
t=0 - No ACK by
t=30s - Timeout handler fired
- Chunk removed from in-flight
- Retry scheduled with backoff
- Retry 1 at
t=31s(1s delay) - Retry 2 at
t=33s(2s delay) - Retry 3 at
t=37s(4s delay) handleChunkFailure()called- Error reported via
onErrorcallback
Result: ✅ 4 total attempts over ~37s before reporting failure
Performance Characteristics
Network Performance
- Chunk size: ~16 KB (0.5 s PCM16 @ 16 kHz, mono)
- Send frequency: ~2 chunks/sec (500 ms intervals)
- Bandwidth: ~32 KB/sec (~256 Kbps)
- In-flight limit: 10 seconds of audio (~200 KB max buffer)
Memory Footprint
- Send queue: 200 KB - 1 MB (10-50 chunks)
- In-flight map: ~320 KB (~16 chunks)
- Audio buffer: ~19 KB (~0.6s buffering)
- Total RAM: ~500 KB - 1.5 MB (worst case)
Monitoring API
We can expose methods for monitoring health and performance:
Client State
// Get current connection state
const state = client.getState();
// Returns: 'idle' | 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'error' | 'PAUSED'
// Get current session ID
const sessionId = client.getSessionId();
// Returns: UUID string
Network Metrics
const metrics = client.getNetworkMetrics();
// Returns: {
// consecutiveFailures: number,
// lastFailureTime: number,
// networkLatency: number,
// throughput: number,
// isOnline: boolean
// }
Circuit Breaker Status
const stats = client.getCircuitBreakerStats();
// Returns: {
// state: 'CLOSED' | 'OPEN' | 'HALF_OPEN',
// failureCount: number,
// lastFailureTime: number,
// retryAfter: number,
// operations: Record<string, OperationStats>
// }
const state = client.getCircuitBreakerState();
// Returns: 'CLOSED' | 'OPEN' | 'HALF_OPEN'
Storage Information
const storageInfo = await client.getStorageInfo();
// Returns: {
// totalChunks: number,
// totalSessions: number,
// currentSessionChunks: number
// }
Best Practices
1. Monitor Circuit Breaker State
client.onState = (state) => {
if (state === "PAUSED") {
// Show user-friendly message
showNotification("Transcription temporarily paused. Retrying...");
}
};
2. Handle Errors Gracefully
client.onError = (error) => {
console.error("Transcription error:", error);
// Log to monitoring service
logError(error, { sessionId: client.getSessionId() });
};
3. Track Network Status
client.onNetworkStatus = (isOnline) => {
if (!isOnline) {
showWarning("Network connection lost. Audio being saved locally.");
} else {
showSuccess("Network reconnected. Resuming transcription.");
}
};
4. Call end() on page unload
// Prevents orphaned sessions and minimises data loss on tab close
window.addEventListener('beforeunload', () => {
flushAudioBuffer();
ws?.send(JSON.stringify({ messageType: 'END_TRANSCRIPTION', sessionId }));
});
5. Use crypto.randomUUID() for session IDs
// Always use the platform-native UUID generator — no custom implementations needed
const sessionId = crypto.randomUUID();
6. Monitor queue depth
// Log a warning if the send queue grows large (sign of sustained disconnection)
setInterval(() => {
if (sendQueue.length > 40) {
console.warn('Large send queue:', sendQueue.length, 'chunks pending');
}
}, 10_000);
7. Clean up on session end
// After transcription is complete, close the audio pipeline
workletNode.disconnect();
audioContext.close();
mediaStream.getTracks().forEach(t => t.stop());
Complete Resilient Integration Example
The following is a self-contained TypeScript snippet that combines all the protection mechanisms described in this guide. It can be used as a starting point for any integration that needs to be robust against network instability.
// ─── Types ────────────────────────────────────────────────────────────────────
interface PendingChunk {
id: string;
sessionId: string;
sequenceNumber: number;
bytesB64: string;
durationMs: number;
retryCount: number;
}
// ─── Configuration ────────────────────────────────────────────────────────────
const WS_URL = 'wss://live.invoxmedical.com';
const ACCESS_TOKEN = 'your_access_token';
const CALLBACK_URL = encodeURIComponent('https://example.mydomain.com');
const SESSION_ID = crypto.randomUUID();
// Audio accumulation: 0.5 s of PCM16 @ 16 kHz mono = 16 000 bytes
const TARGET_PCM_BYTES = 16_000 * 2 * 0.5;
// Backoff
const INITIAL_BACKOFF_MS = 500;
const MAX_BACKOFF_MS = 30_000;
// Flow control: max 10 s of audio in-flight at once
const MAX_IN_FLIGHT_DURATION = 10_000;
// Circuit breaker
const CB_FAILURE_THRESHOLD = 5;
const CB_COOLDOWN_MS = 60_000;
// Chunk acknowledgment timeout
const ACK_TIMEOUT_MS = 30_000;
const MAX_RETRY_ATTEMPTS = 3;
// ─── State ────────────────────────────────────────────────────────────────────
let ws: WebSocket | null = null;
let backoffMs = INITIAL_BACKOFF_MS;
let intentionalClose = false;
let isOnline = navigator.onLine;
let isPaused = false;
let sequenceNumber = 0;
let inFlightAudioDurationMs = 0;
const sendQueue: PendingChunk[] = [];
const inFlight = new Map<string, { chunk: PendingChunk; timestamp: number }>();
const ackTimeouts = new Map<string, ReturnType<typeof setTimeout>>();
// PCM accumulator
let pcmBuffer: ArrayBuffer[] = [];
let pcmBytesCollected = 0;
// Circuit breaker
let cbFailures = 0;
let cbOpenUntil = 0;
// ─── Helpers ──────────────────────────────────────────────────────────────────
function arrayBufferToBase64(buffer: ArrayBuffer): string {
const bytes = new Uint8Array(buffer);
let binary = '';
for (let i = 0; i < bytes.length; i++) binary += String.fromCharCode(bytes[i]);
return btoa(binary);
}
function jitter(ms: number): number {
return Math.floor(ms * (Math.random() * 0.4 + 0.8));
}
function isCircuitOpen(): boolean {
return cbFailures >= CB_FAILURE_THRESHOLD && Date.now() < cbOpenUntil;
}
function recordFailure(): void {
cbFailures++;
if (cbFailures >= CB_FAILURE_THRESHOLD) {
cbOpenUntil = Date.now() + CB_COOLDOWN_MS;
isPaused = true;
console.warn('[circuit] Circuit opened. Pausing for', CB_COOLDOWN_MS / 1000, 's');
setTimeout(() => {
isPaused = false;
cbFailures = 0;
processQueueIfReady();
}, CB_COOLDOWN_MS);
}
}
function recordSuccess(): void {
cbFailures = 0;
}
// ─── Queue Management ─────────────────────────────────────────────────────────
function canSendNow(durationMs: number): boolean {
return (
ws?.readyState === WebSocket.OPEN &&
isOnline &&
!isPaused &&
!isCircuitOpen() &&
inFlightAudioDurationMs + durationMs <= MAX_IN_FLIGHT_DURATION
);
}
function processQueueIfReady(): void {
while (sendQueue.length > 0) {
const chunk = sendQueue[0];
if (!canSendNow(chunk.durationMs)) break;
sendQueue.shift();
inFlight.set(chunk.id, { chunk, timestamp: Date.now() });
inFlightAudioDurationMs += chunk.durationMs;
// Set acknowledgment timeout
ackTimeouts.set(chunk.id, setTimeout(() => handleAckTimeout(chunk.id), ACK_TIMEOUT_MS));
try {
ws!.send(JSON.stringify({
message: chunk.bytesB64,
messageType: 'AUDIO',
sessionId: chunk.sessionId,
}));
recordSuccess();
} catch {
// Return chunk to queue and stop
sendQueue.unshift(chunk);
inFlight.delete(chunk.id);
inFlightAudioDurationMs -= chunk.durationMs;
break;
}
}
}
// ─── Acknowledgment Handling ──────────────────────────────────────────────────
function acknowledgeChunk(chunkId: string): void {
const data = inFlight.get(chunkId);
if (!data) return;
clearTimeout(ackTimeouts.get(chunkId));
ackTimeouts.delete(chunkId);
inFlight.delete(chunkId);
inFlightAudioDurationMs = Math.max(0, inFlightAudioDurationMs - data.chunk.durationMs);
processQueueIfReady();
}
function acknowledgeBatch(ackId: number): void {
for (const [chunkId, data] of inFlight) {
if (data.chunk.sequenceNumber <= ackId) {
acknowledgeChunk(chunkId);
}
}
}
function handleAckTimeout(chunkId: string): void {
const data = inFlight.get(chunkId);
if (!data) return;
inFlight.delete(chunkId);
inFlightAudioDurationMs = Math.max(0, inFlightAudioDurationMs - data.chunk.durationMs);
retryChunk(data.chunk);
}
function retryChunk(chunk: PendingChunk): void {
chunk.retryCount++;
if (chunk.retryCount <= MAX_RETRY_ATTEMPTS) {
setTimeout(() => {
sendQueue.unshift(chunk); // Priority: front
processQueueIfReady();
}, jitter(1000 * Math.pow(2, chunk.retryCount - 1)));
} else {
console.error('[transcription] Chunk failed after', MAX_RETRY_ATTEMPTS, 'retries:', chunk.id);
recordFailure();
}
}
// ─── Network Recovery ─────────────────────────────────────────────────────────
function resendPendingChunks(): void {
let recoveredMs = 0;
for (const [chunkId, data] of inFlight) {
clearTimeout(ackTimeouts.get(chunkId));
ackTimeouts.delete(chunkId);
recoveredMs += data.chunk.durationMs;
sendQueue.unshift(data.chunk);
}
inFlight.clear();
inFlightAudioDurationMs = Math.max(0, inFlightAudioDurationMs - recoveredMs);
sendQueue.sort((a, b) => a.sequenceNumber - b.sequenceNumber);
}
window.addEventListener('online', () => { isOnline = true; connectWebSocket(); });
window.addEventListener('offline', () => { isOnline = false; });
// ─── WebSocket with Exponential Backoff ───────────────────────────────────────
function connectWebSocket(): void {
if (intentionalClose || !isOnline) return;
ws = new WebSocket(
`${WS_URL}?callbackUrl=${CALLBACK_URL}&accessToken=${ACCESS_TOKEN}&sessionId=${SESSION_ID}`
);
ws.onopen = () => {
backoffMs = INITIAL_BACKOFF_MS;
resendPendingChunks();
processQueueIfReady();
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Individual chunk ACK
if (data.chunkId || data.id) {
acknowledgeChunk(data.chunkId ?? data.id);
}
// Batch ACK: all chunks with sequenceNumber <= ack_id are confirmed
if (data.type === 'AUDIO_CHUNK_ACK' && typeof data.ack_id === 'number') {
acknowledgeBatch(data.ack_id);
}
// Transcription complete
if (data.isTranscriptionCompleted === true) {
console.log('[transcription] Complete:', data);
}
};
ws.onclose = () => {
if (intentionalClose) return;
setTimeout(() => {
backoffMs = Math.min(MAX_BACKOFF_MS, backoffMs * 2);
connectWebSocket();
}, jitter(backoffMs));
};
ws.onerror = () => {
recordFailure();
ws?.close();
};
}
// ─── Audio Accumulator (0.5 s threshold) ─────────────────────────────────────
function onPcmChunk(chunk: ArrayBuffer): void {
pcmBuffer.push(chunk);
pcmBytesCollected += chunk.byteLength;
if (pcmBytesCollected >= TARGET_PCM_BYTES) {
flushAudioBuffer();
}
}
function flushAudioBuffer(): void {
if (pcmBuffer.length === 0) return;
const combined = new Uint8Array(pcmBytesCollected);
let offset = 0;
for (const buf of pcmBuffer) {
combined.set(new Uint8Array(buf), offset);
offset += buf.byteLength;
}
const durationMs = Math.round((pcmBytesCollected / 2 / 16_000) * 1000);
const seq = sequenceNumber++;
const chunk: PendingChunk = {
id: `chunk_${SESSION_ID}_${seq}`,
sessionId: SESSION_ID,
sequenceNumber: seq,
bytesB64: arrayBufferToBase64(combined.buffer),
durationMs,
retryCount: 0,
};
pcmBuffer = [];
pcmBytesCollected = 0;
sendQueue.push(chunk);
processQueueIfReady();
}
// ─── AudioWorklet Capture ─────────────────────────────────────────────────────
const processorCode = `
class AudioProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.inputSampleRate = 48000;
this.targetSampleRate = 16000;
this.targetChunkSamples = Math.floor(this.targetSampleRate * 0.085);
this.samplesNeeded = Math.floor(
this.targetChunkSamples * (this.inputSampleRate / this.targetSampleRate)
);
this.bufferSize = this.inputSampleRate * 2;
this.buffer = new Float32Array(this.bufferSize);
this.writeIndex = 0;
}
downsample(length) {
const ratio = this.inputSampleRate / this.targetSampleRate;
const outLen = Math.round(length / ratio);
const result = new Float32Array(outLen);
for (let i = 0; i < outLen; i++) {
const start = Math.floor(i * ratio);
const end = Math.min(Math.floor((i + 1) * ratio), length);
let sum = 0;
for (let j = start; j < end; j++) sum += this.buffer[j];
result[i] = sum / (end - start);
}
return result;
}
floatTo16BitPCM(input) {
const buf = new ArrayBuffer(input.length * 2);
const view = new DataView(buf);
for (let i = 0; i < input.length; i++) {
const s = Math.max(-1, Math.min(1, input[i]));
view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true);
}
return buf;
}
process(inputs) {
const input = inputs[0];
if (!input || !input[0]) return true;
const data = input[0];
if (this.writeIndex + data.length <= this.bufferSize) {
this.buffer.set(data, this.writeIndex);
this.writeIndex += data.length;
} else {
this.writeIndex = 0;
return true;
}
while (this.writeIndex >= this.samplesNeeded) {
const downsampled = this.downsample(this.samplesNeeded);
const pcm16 = this.floatTo16BitPCM(downsampled);
this.buffer.copyWithin(0, this.samplesNeeded, this.writeIndex);
this.writeIndex -= this.samplesNeeded;
this.port.postMessage({ pcm16, samples: downsampled.length }, [pcm16]);
}
return true;
}
}
registerProcessor('audio-processor', AudioProcessor);
`;
const blob = new Blob([processorCode], { type: 'application/javascript' });
const processorUrl = URL.createObjectURL(blob);
const mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext();
await audioContext.audioWorklet.addModule(processorUrl);
URL.revokeObjectURL(processorUrl);
const source = audioContext.createMediaStreamSource(mediaStream);
const workletNode = new AudioWorkletNode(audioContext, 'audio-processor');
source.connect(workletNode);
workletNode.connect(audioContext.destination);
workletNode.port.onmessage = (event) => {
onPcmChunk(event.data.pcm16);
};
// ─── Graceful teardown on page close ─────────────────────────────────────────
window.addEventListener('beforeunload', () => {
flushAudioBuffer();
ws?.send(JSON.stringify({ messageType: 'END_TRANSCRIPTION', sessionId: SESSION_ID }));
});
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') flushAudioBuffer();
});
// ─── Start ────────────────────────────────────────────────────────────────────
connectWebSocket();
// To stop recording intentionally:
function stopRecording(): void {
intentionalClose = true;
flushAudioBuffer();
ws?.send(JSON.stringify({ messageType: 'END_TRANSCRIPTION', sessionId: SESSION_ID }));
}
Conclusion
The Invox Medical streaming transcription client provides enterprise-grade audio protection through:
- ✅ In-memory send queue with sequence-ordered retry
- ✅ Intelligent retry logic with exponential backoff
- ✅ Circuit breaker protection against cascading failures
- ✅ Dual acknowledgment system for high-throughput efficiency
- ✅ Automatic recovery from network failures and transient outages
- ✅ Flow control to prevent buffer overflow
- ✅ Online/offline detection with immediate queue drain on reconnect
- ✅ Graceful teardown via
beforeunloadto minimise data loss on page close
This architecture ensures that audio transcription remains robust and reliable even under adverse conditions including network failures, server outages, and high-latency scenarios.