---
title: "How to Implement MediaRecorder API Audio Recording and Transcription with iPhone Safari Support"
slug: "iphone-safari-mediarecorder-audio-recording-transcription"
published: "2025-08-26"
updated: "2025-08-26"
categories:
  - "React"
tags:
  - "MediaRecorder API"
  - "iPhone Safari"
  - "WebM Opus"
  - "audio recording"
  - "Google Speech-to-Text"
  - "Next.js"
  - "TypeScript"
  - "audio transcription"
  - "mime types"
  - "encoding"
llm-intent: "reference"
audience-level: "intermediate"
framework-versions:
  - "next.js@15"
  - "typescript@5.5"
status: "stable"
llm-purpose: "Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text."
llm-prereqs:
  - "Access to MediaRecorder API"
  - "Access to Next.js"
  - "Access to TypeScript"
  - "Access to Google Speech-to-Text"
llm-outputs:
  - "Completed outcome: Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text."
---

**Summary Triples**
- (MediaRecorder format detection, use, MediaRecorder.isTypeSupported(...) to probe supported MIME types and select the best available format at runtime)
- (iPhone Safari, commonly produces, audio/webm;codecs=opus which can cause encoding mismatches if server expects WAV/LINEAR16)
- (Cross-device recording, requires, dynamic format detection and conditional handling (send as WEBM_OPUS to STT or convert to LINEAR16))
- (Upload to transcription backend, send, the recorded Blob with correct Content-Type and use recognitionConfig.encoding matching the actual audio encoding)
- (Google Speech-to-Text config, set, recognitionConfig.encoding to 'WEBM_OPUS' for webm/opus recordings or 'LINEAR16' for PCM WAV)
- (Large or unsupported formats, recommendation, either convert server-side with ffmpeg to LINEAR16 or upload to Google Cloud Storage and use longRunningRecognize)
- (Recording pipeline, steps, detect MIME -> start MediaRecorder with chosen MIME -> collect Blob -> upload server-side -> (optional) convert -> call Google Speech-to-Text with matching encoding)

### {GOAL}
Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text.

### {PREREQS}
- Access to MediaRecorder API
- Access to Next.js
- Access to TypeScript
- Access to Google Speech-to-Text

### {STEPS}
1. Follow the detailed walkthrough in the article content below.

<!-- llm:goal="Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text." -->
<!-- llm:prereq="Access to MediaRecorder API" -->
<!-- llm:prereq="Access to Next.js" -->
<!-- llm:prereq="Access to TypeScript" -->
<!-- llm:prereq="Access to Google Speech-to-Text" -->
<!-- llm:output="Completed outcome: Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text." -->

# How to Implement MediaRecorder API Audio Recording and Transcription with iPhone Safari Support
> Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text.
Matija Žiberna · 2025-08-26

I was building a voice recording feature for a client project when I discovered something frustrating: audio transcription worked perfectly on desktop and Android devices, but consistently failed on iPhones. After diving deep into the MediaRecorder API and Google Speech-to-Text integration, I realized the issue wasn't just a simple bug—it was a fundamental difference in how iPhone Safari handles audio recording.

This guide walks you through building a complete audio recording and transcription system that works seamlessly across all devices, including the tricky iPhone Safari case. By the end, you'll have a robust implementation that properly handles different audio formats and integrates smoothly with Google's Speech-to-Text API.

## Understanding the iPhone Safari Challenge

Before jumping into code, it's crucial to understand why iPhone Safari requires special handling. Most browsers support multiple audio formats for MediaRecorder, but iPhone Safari has specific preferences:

- **Desktop Chrome/Firefox**: Often defaults to `audio/webm` or `audio/wav`
- **Android Chrome**: Typically uses `audio/webm` 
- **iPhone Safari**: Produces `audio/webm;codecs=opus` specifically

The problem occurs when you hardcode audio format assumptions. If your system expects WAV files but receives WebM/Opus from iPhone Safari, transcription services like Google Speech-to-Text will reject the audio with encoding errors.

## Step 1: Setting Up Smart MediaRecorder Format Detection

The foundation of cross-device compatibility is proper format detection. Instead of assuming a format, we need to detect what each device supports and choose appropriately.

```typescript
// File: src/components/audio-recorder.tsx
const startRecording = useCallback(async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    
    // Smart format detection with iPhone priority
    let selectedMimeType = 'audio/webm'; // fallback
    const supportedTypes = [
      'audio/webm;codecs=opus',  // iPhone Safari preference
      'audio/webm',
      'audio/mp4',
      'audio/wav'
    ];
    
    for (const type of supportedTypes) {
      if (MediaRecorder.isTypeSupported(type)) {
        selectedMimeType = type;
        break;
      }
    }
    
    console.log(`Selected audio format: ${selectedMimeType}`);
    
    const recorder = new MediaRecorder(stream, {
      mimeType: selectedMimeType
    });
    
    // Store chunks for later blob creation
    const chunks: BlobPart[] = [];
    
    recorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        chunks.push(event.data);
      }
    };
    
    recorder.onstop = () => {
      // Critical: Use the actual detected MIME type, not a hardcoded one
      const audioBlob = new Blob(chunks, { type: selectedMimeType });
      onRecordingComplete(audioBlob);
    };
    
    recorder.start();
    setMediaRecorder(recorder);
    
  } catch (error) {
    console.error('Recording failed:', error);
  }
}, []);
```

This approach prioritizes iPhone Safari's preferred format while maintaining compatibility with other browsers. The key insight is using `MediaRecorder.isTypeSupported()` to test formats in order of preference, ensuring we get the best format each device can produce.

## Step 2: Implementing Proper File Upload with Format Awareness

Once you have an audio blob with the correct MIME type, the upload system needs to handle different formats appropriately. The critical piece is mapping MIME types to correct file extensions.

```typescript
// File: src/lib/upload-handler.ts
const getCorrectExtension = (mimeType: string): string => {
  const mimeToExt: Record<string, string> = {
    'audio/webm': 'webm',
    'audio/webm;codecs=opus': 'webm',    // iPhone Safari
    'audio/mp4': 'm4a',
    'audio/wav': 'wav',
    'audio/mpeg': 'mp3',
    'audio/flac': 'flac',
    'audio/ogg': 'ogg'
  };
  
  return mimeToExt[mimeType] || 'webm';  // Default to webm for iPhone compatibility
};

export const uploadAudioFile = async (audioBlob: Blob): Promise<string> => {
  const detectedExtension = getCorrectExtension(audioBlob.type);
  const fileName = `${Date.now()}.${detectedExtension}`;
  
  console.log(`Uploading audio: ${audioBlob.type} -> ${fileName}`);
  
  const formData = new FormData();
  formData.append('audio', audioBlob, fileName);
  
  const response = await fetch('/api/upload/audio', {
    method: 'POST',
    body: formData,
  });
  
  if (!response.ok) {
    throw new Error('Upload failed');
  }
  
  const { fileUrl } = await response.json();
  return fileUrl;
};
```

The extension mapping is crucial because Google Speech-to-Text API uses file extensions to help determine encoding. When iPhone Safari produces WebM/Opus audio, it needs to be saved with a `.webm` extension, not `.wav`.

## Step 3: Configuring Google Speech-to-Text API with Dynamic Encoding

The most critical part of the implementation is configuring the Google Speech API with the correct encoding based on your uploaded audio format. Missing or incorrect encoding parameters cause the "bad encoding" errors.

```typescript
// File: src/lib/google-speech.ts
import { SpeechClient } from '@google-cloud/speech';

const detectEncodingFromFile = (fileName: string): string => {
  const extension = fileName.split('.').pop()?.toLowerCase();
  
  switch (extension) {
    case 'webm': return 'WEBM_OPUS';    // iPhone Safari files
    case 'wav': return 'LINEAR16';
    case 'mp3': return 'MP3';
    case 'm4a':
    case 'mp4': return 'MP3';
    default: return 'WEBM_OPUS';        // Safe default for iPhone
  }
};

const getModelForLanguage = (languageCode: string): string | undefined => {
  // Enhanced models are only available for certain languages
  const enhancedModelLanguages = ['en-US', 'en-GB'];
  return enhancedModelLanguages.includes(languageCode) ? 'latest_long' : undefined;
};

export const transcribeAudio = async (
  gcsUri: string, 
  languageCode: string = 'en-US'
): Promise<string> => {
  const client = new SpeechClient();
  
  // Extract filename from GCS URI to detect encoding
  const fileName = gcsUri.split('/').pop() || '';
  const detectedEncoding = detectEncodingFromFile(fileName);
  const model = getModelForLanguage(languageCode);
  
  console.log(`Transcribing audio: ${fileName}`);
  console.log(`Detected encoding: ${detectedEncoding}`);
  console.log(`Language: ${languageCode}, Model: ${model || 'default'}`);
  
  const request = {
    config: {
      languageCode,
      encoding: detectedEncoding, // This is critical for iPhone compatibility
      enableAutomaticPunctuation: true,
      ...(model && { model }) // Only include model if supported
    },
    audio: { uri: gcsUri }
  };
  
  try {
    const [response] = await client.recognize(request);
    const transcription = response.results
      ?.map(result => result.alternatives?.[0]?.transcript)
      .filter(Boolean)
      .join(' ') || '';
      
    return transcription;
  } catch (error) {
    console.error('Transcription failed:', error);
    throw new Error('Failed to transcribe audio');
  }
};
```

The encoding detection is the heart of iPhone compatibility. When the API receives a `.webm` file, it knows to expect `WEBM_OPUS` encoding rather than trying to process it as `LINEAR16` (which would cause encoding errors).

## Step 4: Adding Robust Audio Validation

Both client and server-side validation need to handle the variety of MIME types that different devices produce, including codec specifications.

```typescript
// File: src/lib/audio-validation.ts
const ALLOWED_AUDIO_TYPES = [
  'audio/wav',
  'audio/mpeg', 
  'audio/mp4',
  'audio/webm',
  'application/octet-stream' // Fallback for some uploads
];

export const validateAudioFile = (file: Blob | File): boolean => {
  if (!file.type) {
    console.warn('File has no MIME type, allowing as fallback');
    return true; // Allow files without MIME type
  }
  
  // Use prefix matching to handle codec specifications
  // This accepts "audio/webm;codecs=opus" when "audio/webm" is allowed
  const isAllowed = ALLOWED_AUDIO_TYPES.some(allowedType => 
    file.type.startsWith(allowedType)
  );
  
  if (!isAllowed) {
    console.error(`Unsupported audio type: ${file.type}`);
  }
  
  return isAllowed;
};
```

```typescript
// File: src/app/api/upload/audio/route.ts
export async function POST(request: Request) {
  const formData = await request.formData();
  const audioFile = formData.get('audio') as File;
  
  if (!audioFile) {
    return NextResponse.json({ error: 'No audio file provided' }, { status: 400 });
  }
  
  // Server-side validation with same logic
  if (!validateAudioFile(audioFile)) {
    return NextResponse.json({ error: 'Invalid audio format' }, { status: 400 });
  }
  
  // Upload to your storage service (Google Cloud Storage, S3, etc.)
  const fileUrl = await uploadToStorage(audioFile);
  
  return NextResponse.json({ fileUrl });
}
```

The prefix matching approach is essential because iPhone Safari sends `audio/webm;codecs=opus`, but your allowed types list contains `audio/webm`. Exact string matching would reject this perfectly valid format.

## Step 5: Building a Complete Recording Component

Here's how all the pieces fit together in a complete React component:

```typescript
// File: src/components/voice-recorder.tsx
import { useState, useRef, useCallback } from 'react';
import { uploadAudioFile } from '@/lib/upload-handler';
import { transcribeAudio } from '@/lib/google-speech';
import { validateAudioFile } from '@/lib/audio-validation';

export const VoiceRecorder = () => {
  const [isRecording, setIsRecording] = useState(false);
  const [isProcessing, setIsProcessing] = useState(false);
  const [transcription, setTranscription] = useState('');
  const mediaRecorderRef = useRef<MediaRecorder | null>(null);
  const chunksRef = useRef<BlobPart[]>([]);

  const startRecording = useCallback(async () => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      
      // Format detection logic from Step 1
      let selectedMimeType = 'audio/webm';
      const supportedTypes = [
        'audio/webm;codecs=opus',
        'audio/webm',
        'audio/mp4',
        'audio/wav'
      ];
      
      for (const type of supportedTypes) {
        if (MediaRecorder.isTypeSupported(type)) {
          selectedMimeType = type;
          break;
        }
      }
      
      const recorder = new MediaRecorder(stream, { mimeType: selectedMimeType });
      chunksRef.current = [];
      
      recorder.ondataavailable = (event) => {
        if (event.data.size > 0) {
          chunksRef.current.push(event.data);
        }
      };
      
      recorder.onstop = async () => {
        const audioBlob = new Blob(chunksRef.current, { type: selectedMimeType });
        await processRecording(audioBlob);
      };
      
      recorder.start();
      mediaRecorderRef.current = recorder;
      setIsRecording(true);
      
    } catch (error) {
      console.error('Failed to start recording:', error);
    }
  }, []);

  const stopRecording = useCallback(() => {
    if (mediaRecorderRef.current && isRecording) {
      mediaRecorderRef.current.stop();
      mediaRecorderRef.current.stream.getTracks().forEach(track => track.stop());
      setIsRecording(false);
    }
  }, [isRecording]);

  const processRecording = async (audioBlob: Blob) => {
    setIsProcessing(true);
    
    try {
      // Validate the audio file
      if (!validateAudioFile(audioBlob)) {
        throw new Error('Invalid audio format');
      }
      
      // Upload the audio file
      const fileUrl = await uploadAudioFile(audioBlob);
      
      // Transcribe using Google Speech API
      const result = await transcribeAudio(fileUrl);
      setTranscription(result);
      
    } catch (error) {
      console.error('Processing failed:', error);
    } finally {
      setIsProcessing(false);
    }
  };

  return (
    <div className="p-4">
      <div className="flex gap-4 mb-4">
        <button
          onClick={isRecording ? stopRecording : startRecording}
          disabled={isProcessing}
          className="px-4 py-2 bg-blue-500 text-white rounded disabled:opacity-50"
        >
          {isRecording ? 'Stop Recording' : 'Start Recording'}
        </button>
      </div>
      
      {isProcessing && (
        <div className="text-gray-600">Processing audio...</div>
      )}
      
      {transcription && (
        <div className="mt-4 p-3 bg-gray-100 rounded">
          <h3 className="font-semibold mb-2">Transcription:</h3>
          <p>{transcription}</p>
        </div>
      )}
    </div>
  );
};
```

## Step 6: Testing Across Devices

Since testing on actual iPhones during development isn't always practical, you can implement device simulation for testing different format scenarios:

```typescript
// File: src/components/device-simulator.tsx (development only)
type DevicePreset = {
  name: string;
  description: string;
  forceFormat: string;
};

const DEVICE_PRESETS: Record<string, DevicePreset> = {
  'iphone-safari': {
    name: 'iPhone Safari',
    description: 'WebM/Opus',
    forceFormat: 'audio/webm;codecs=opus'
  },
  'android-chrome': {
    name: 'Android Chrome',
    description: 'WebM',
    forceFormat: 'audio/webm'
  },
  'desktop-chrome': {
    name: 'Desktop Chrome',
    description: 'WAV',
    forceFormat: 'audio/wav'
  }
};

// In your recording component, add development-only simulation
const startRecording = useCallback(async () => {
  // ... existing code ...
  
  // Development simulation (only show on localhost)
  if (process.env.NODE_ENV === 'development' && selectedSimulation) {
    const preset = DEVICE_PRESETS[selectedSimulation];
    if (preset && MediaRecorder.isTypeSupported(preset.forceFormat)) {
      selectedMimeType = preset.forceFormat;
      console.log(`🧪 SIMULATING ${preset.name}: ${selectedMimeType}`);
    }
  }
  
  // ... rest of recording code ...
}, [selectedSimulation]);
```

This simulation approach lets you test iPhone Safari behavior on your development machine, ensuring your format detection and encoding logic work correctly before deploying.

## Monitoring and Debugging

Add comprehensive logging to troubleshoot issues across different devices:

```typescript
// Enhanced logging throughout your implementation
console.log(`[MediaRecorder] Detected format: ${selectedMimeType}`);
console.log(`[Upload] File: ${fileName}, Size: ${audioBlob.size} bytes`);
console.log(`[Speech API] Encoding: ${detectedEncoding}, Language: ${languageCode}`);
```

This logging helps identify exactly where format mismatches occur and ensures each step of the pipeline handles device-specific formats correctly.

## Conclusion

Building audio recording that works seamlessly across all devices, especially iPhone Safari, requires understanding the nuances of how different browsers handle MediaRecorder formats. The key is building a system that detects and adapts to each device's preferred format rather than making assumptions.

The implementation covers the complete pipeline: smart format detection in MediaRecorder, proper file extensions during upload, correct encoding configuration for Google Speech API, and robust validation that handles codec specifications. With device simulation for development testing, you can ensure your implementation works across all target devices.

This approach gives you a production-ready audio recording and transcription system that handles the tricky iPhone Safari case while maintaining compatibility with all other browsers. Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

## LLM Response Snippet
```json
{
  "goal": "Build a robust audio recording and transcription pipeline that works on iPhone Safari using MediaRecorder format detection, uploads, and Speech-to-Text.",
  "responses": [
    {
      "question": "How do I detect which audio MIME type to record with MediaRecorder?",
      "answer": "Probe supported types in order of preference using MediaRecorder.isTypeSupported(mime). Build an ordered list (e.g., 'audio/webm;codecs=opus', 'audio/webm', 'audio/wav') and pick the first supported. Then pass that MIME to the MediaRecorder constructor in the options parameter."
    },
    {
      "question": "iPhone Safari recordings fail transcription — what should I change?",
      "answer": "Treat iPhone Safari as producing 'audio/webm;codecs=opus'. When uploading, either (1) tell Google Speech-to-Text to use encoding 'WEBM_OPUS' and the correct sampleRateHertz, or (2) convert the webm/opus blob server-side to LINEAR16 (WAV) with ffmpeg and then call Speech-to-Text with encoding 'LINEAR16'."
    },
    {
      "question": "What are the client-side steps to capture and upload audio reliably?",
      "answer": "1) Detect supported MIME and pick one. 2) Start getUserMedia({ audio: true }) and create MediaRecorder(stream, { mimeType }). 3) Collect dataavailable events into an array. 4) On stop, create a Blob from chunks with the chosen MIME. 5) Upload Blob via fetch as FormData (include detected mimeType and sample rate if known)."
    },
    {
      "question": "How should I configure Google Speech-to-Text for WebM/Opus audio?",
      "answer": "When calling the Speech-to-Text API, set recognitionConfig.encoding to 'WEBM_OPUS' and recognitionConfig.sampleRateHertz to the recording's sample rate (often 48000). For large files, upload to Google Cloud Storage and use longRunningRecognize with the correct config."
    },
    {
      "question": "When should I convert audio server-side with ffmpeg?",
      "answer": "Convert when the transcription service doesn't accept the recorded format, when you need consistent encoding across devices, or when you need to normalize sample rate/bit depth. Use a pipeline: receive Blob -> store temporarily -> ffmpeg -i input.webm -ar 16000 -ac 1 -f wav output.wav -> send output.wav to Speech-to-Text with encoding LINEAR16."
    },
    {
      "question": "How do I avoid CORS or content-type mistakes when uploading?",
      "answer": "Upload using multipart/form-data (FormData) and include the blob with the correct MIME. On the server, read the content-type from the incoming file or from a sent mimeType field rather than assuming. Ensure your upload endpoint allows the origin or use same-origin requests in Next.js API routes."
    }
  ]
}
```