I was building a voice recording feature for a client project when I discovered something frustrating: audio transcription worked perfectly on desktop and Android devices, but consistently failed on iPhones. After diving deep into the MediaRecorder API and Google Speech-to-Text integration, I realized the issue wasn't just a simple bug—it was a fundamental difference in how iPhone Safari handles audio recording.
This guide walks you through building a complete audio recording and transcription system that works seamlessly across all devices, including the tricky iPhone Safari case. By the end, you'll have a robust implementation that properly handles different audio formats and integrates smoothly with Google's Speech-to-Text API.
Understanding the iPhone Safari Challenge
Before jumping into code, it's crucial to understand why iPhone Safari requires special handling. Most browsers support multiple audio formats for MediaRecorder, but iPhone Safari has specific preferences:
Desktop Chrome/Firefox: Often defaults to audio/webm or audio/wav
The problem occurs when you hardcode audio format assumptions. If your system expects WAV files but receives WebM/Opus from iPhone Safari, transcription services like Google Speech-to-Text will reject the audio with encoding errors.
Step 1: Setting Up Smart MediaRecorder Format Detection
The foundation of cross-device compatibility is proper format detection. Instead of assuming a format, we need to detect what each device supports and choose appropriately.
typescript
// File: src/components/audio-recorder.tsxconst startRecording = useCallback(async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
// Smart format detection with iPhone prioritylet selectedMimeType = 'audio/webm'; // fallbackconst supportedTypes = [
'audio/webm;codecs=opus', // iPhone Safari preference'audio/webm',
'audio/mp4',
'audio/wav'
];
for (consttypeof supportedTypes) {
if (MediaRecorder.isTypeSupported(type)) {
selectedMimeType = type;
break;
}
}
console.log(`Selected audio format: ${selectedMimeType}`);
const recorder = newMediaRecorder(stream, {
mimeType: selectedMimeType
});
// Store chunks for later blob creationconstchunks: BlobPart[] = [];
recorder.ondataavailable = (event) => {
if (event.data.size > 0) {
chunks.push(event.data);
}
};
recorder.onstop = () => {
// Critical: Use the actual detected MIME type, not a hardcoded oneconst audioBlob = newBlob(chunks, { type: selectedMimeType });
onRecordingComplete(audioBlob);
};
recorder.start();
setMediaRecorder(recorder);
} catch (error) {
console.error('Recording failed:', error);
}
}, []);
This approach prioritizes iPhone Safari's preferred format while maintaining compatibility with other browsers. The key insight is using MediaRecorder.isTypeSupported() to test formats in order of preference, ensuring we get the best format each device can produce.
Step 2: Implementing Proper File Upload with Format Awareness
Once you have an audio blob with the correct MIME type, the upload system needs to handle different formats appropriately. The critical piece is mapping MIME types to correct file extensions.
The extension mapping is crucial because Google Speech-to-Text API uses file extensions to help determine encoding. When iPhone Safari produces WebM/Opus audio, it needs to be saved with a .webm extension, not .wav.
Step 3: Configuring Google Speech-to-Text API with Dynamic Encoding
The most critical part of the implementation is configuring the Google Speech API with the correct encoding based on your uploaded audio format. Missing or incorrect encoding parameters cause the "bad encoding" errors.
The encoding detection is the heart of iPhone compatibility. When the API receives a .webm file, it knows to expect WEBM_OPUS encoding rather than trying to process it as LINEAR16 (which would cause encoding errors).
Step 4: Adding Robust Audio Validation
Both client and server-side validation need to handle the variety of MIME types that different devices produce, including codec specifications.
typescript
// File: src/lib/audio-validation.tsconstALLOWED_AUDIO_TYPES = [
'audio/wav',
'audio/mpeg',
'audio/mp4',
'audio/webm',
'application/octet-stream'// Fallback for some uploads
];
exportconst validateAudioFile = (file: Blob | File): boolean => {
if (!file.type) {
console.warn('File has no MIME type, allowing as fallback');
returntrue; // Allow files without MIME type
}
// Use prefix matching to handle codec specifications// This accepts "audio/webm;codecs=opus" when "audio/webm" is allowedconst isAllowed = ALLOWED_AUDIO_TYPES.some(allowedType =>
file.type.startsWith(allowedType)
);
if (!isAllowed) {
console.error(`Unsupported audio type: ${file.type}`);
}
return isAllowed;
};
The prefix matching approach is essential because iPhone Safari sends audio/webm;codecs=opus, but your allowed types list contains audio/webm. Exact string matching would reject this perfectly valid format.
Step 5: Building a Complete Recording Component
Here's how all the pieces fit together in a complete React component:
This simulation approach lets you test iPhone Safari behavior on your development machine, ensuring your format detection and encoding logic work correctly before deploying.
Monitoring and Debugging
Add comprehensive logging to troubleshoot issues across different devices:
This logging helps identify exactly where format mismatches occur and ensures each step of the pipeline handles device-specific formats correctly.
Conclusion
Building audio recording that works seamlessly across all devices, especially iPhone Safari, requires understanding the nuances of how different browsers handle MediaRecorder formats. The key is building a system that detects and adapts to each device's preferred format rather than making assumptions.
The implementation covers the complete pipeline: smart format detection in MediaRecorder, proper file extensions during upload, correct encoding configuration for Google Speech API, and robust validation that handles codec specifications. With device simulation for development testing, you can ensure your implementation works across all target devices.
This approach gives you a production-ready audio recording and transcription system that handles the tricky iPhone Safari case while maintaining compatibility with all other browsers. Let me know in the comments if you have questions, and subscribe for more practical development guides.