---
title: "How to Build a Long Audio Transcription Tool with OpenAI's Whisper API (Python)"
slug: "building-a-long-audio-transcription-tool-with-openai-s-whisper-api"
published: "2025-03-07"
updated: "2025-12-26"
validated: "2025-12-26"
categories:
  - "React"
tags:
  - "build transcription tool python"
  - "openai whisper api implementation"
  - "audio file chunking python"
  - "transcription tool architecture"
  - "whisper api python tutorial"
  - "long audio processing python"
  - "ffmpeg audio splitting"
  - "timestamp management transcription"
  - "openai api audio chunking"
  - "transcription progress tracking"
  - "build audio transcriber"
audience-level: "intermediate"
status: "stable"
llm-purpose: "Build a Python transcription tool with OpenAI Whisper: chunking large files, progress tracking, timestamp management, error recovery. Complete implementation guide with code."
---

**Summary Triples**
- (How to Build a Long Audio Transcription Tool with OpenAI's Whisper API (Python), expresses-intent, reference)
- (How to Build a Long Audio Transcription Tool with OpenAI's Whisper API (Python), covers-topic, build transcription tool python)
- (How to Build a Long Audio Transcription Tool with OpenAI's Whisper API (Python), provides-guidance-for, Build a Python transcription tool with OpenAI Whisper: chunking large files, progress tracking, timestamp management, error recovery. Complete implementation guide with code.)

### {GOAL}
Build a Python transcription tool with OpenAI Whisper: chunking large files, progress tracking, timestamp management, error recovery. Complete implementation guide with code.

### {PREREQS}
- Familiarity with the concepts discussed in this article.

### {STEPS}
1. Follow the detailed walkthrough in the article content below.

<!-- llm:goal="Build a Python transcription tool with OpenAI Whisper: chunking large files, progress tracking, timestamp management, error recovery. Complete implementation guide with code." -->

# How to Build a Long Audio Transcription Tool with OpenAI's Whisper API (Python)
> Build a Python transcription tool with OpenAI Whisper: chunking large files, progress tracking, timestamp management, error recovery. Complete implementation guide with code.
Matija Žiberna · 2025-03-07

**⚠️ Looking to transcribe audio without coding?** Check out [Otter.ai](https://otter.ai), [Rev](https://www.rev.com), or [Descript](https://www.descript.com) instead. This guide is for developers building their own transcription tool.

---

In this tutorial, we'll build a robust audio transcription tool that can handle files of any length using OpenAI's Whisper API. The tool automatically splits large files into chunks, tracks progress, and provides timestamped output.

Source code can be found at the bottom.

---

## What We’ve Built

We’ve created a Python-based transcription tool that solves several common challenges:

* Handling large audio files (>25MB OpenAI limit)
* Maintaining correct timestamps across file chunks
* Resuming interrupted transcriptions
* Organizing transcribed text into time intervals

---

## Key Features

* Automatic file splitting
* Progress tracking and resume capability
* Timestamped word-level transcription
* Time-interval grouping of transcriptions
* Support for multiple audio formats

---

## Step-by-Step Guide

### 1. Project Setup

This involves creating a dedicated folder for your project. This helps keep all related files (code, audio, and output) organized. Inside this folder, you'll typically initialize a Python virtual environment. A virtual environment isolates your project's dependencies, preventing conflicts with other Python projects you might have on your system.

First, create a new project directory and set up the environment:

```bash
mkdir long-audio-transcriber
cd long-audio-transcriber
python -m venv venv
source venv/bin/activate
```

* `mkdir long-audio-transcriber`: Creates a directory named long-audio-transcriber
* `cd long-audio-transcriber`: Changes the current directory to the newly created one
* `python -m venv venv`: Creates a virtual environment named venv inside the project directory
* `source venv/bin/activate`: Activates the virtual environment. The command for Windows is slightly different (`venv\Scripts\activate`). Activating the environment ensures that any packages you install will be specific to this project

---

### 2. Install Dependencies

This step involves installing the Python libraries needed for the project. These libraries provide pre-built functionalities, making development faster and easier.

This pip command installs:

* `requests`: Used for making HTTP requests to the OpenAI API
* `ffmpeg-python`: A Python wrapper for ffmpeg, used for audio file splitting. Remember, you need to have ffmpeg itself installed on your system
* `python-dotenv`: For loading environment variables from the `.env` file

---

### 3. Environment Configuration

Create a `.env` file to store your OpenAI API key:

Environment variables securely store sensitive information, like API keys, outside your code. This prevents accidental exposure of your key.

This command creates a `.env` file and adds your OpenAI API key to it. Replace `"your-api-key-here"` with your actual API key.

The `python-dotenv` library will later load this key into your Python script. This single line is a bash command. It is writing the text to a file called `.env`.

---

## 4. Core Components

### 1. Audio File Splitting Implementation

The OpenAI Whisper API has a file size limit (around 25MB). To handle larger files, the script splits them into smaller, manageable chunks. ffmpeg is chosen for its efficiency and precision in audio processing, minimizing quality loss.

We used ffmpeg to split large audio files into manageable chunks.

**Key points:**

* Used ffmpeg for precise audio splitting
* Maintained PCM WAV format for best quality
* Calculated chunk size based on file size and duration
* Preserved timing information for later merging

This `split_audio_file` function does the following:

* **Gets File Information:** It retrieves the audio file's total duration (`get_audio_duration`, a function you'd need to define separately, likely using `ffmpeg.probe`) and file size
* **Calculates Chunks:** It determines the number of chunks needed to keep each chunk below `MAX_SIZE_MB` (which you should define, e.g. `MAX_SIZE_MB = 24`). It then calculates the duration of each chunk
* **Splits the Audio:** It loops through the calculated number of chunks

  * `start_time`: Calculates the starting time for the current chunk
  * `chunk_path`: Creates a filename for the chunk (e.g. `temp_chunks/chunk_001.wav`). You'll need to create the `temp_chunks` directory beforehand
  * `ffmpeg.input(file_path, ss=start_time, t=chunk_duration)`: Uses ffmpeg to select a portion of the input audio, starting at `start_time` and lasting for `chunk_duration`. `ss` (seek start) is used for fast and accurate seeking. `t` specifies the duration
  * `ffmpeg.output(stream, chunk_path, acodec='pcm_s16le')`: Specifies the output filename and sets the audio codec to `pcm_s16le`. This ensures the output is a WAV file with 16-bit PCM encoding, which is lossless and compatible with Whisper
  * `ffmpeg.run(stream, overwrite_output=True, quiet=True)`: Executes the ffmpeg command. `overwrite_output=True` allows overwriting existing chunk files, and `quiet=True` suppresses ffmpeg's console output
  * `chunks.append(chunk_path)`: Adds the path of the created chunk to a list, which is returned at the end of the function

---

### 2. OpenAI API Setup

**Get API Key:**

1. Go to [https://platform.openai.com/](https://platform.openai.com/)
2. Sign up or log in
3. Navigate to API Keys section
4. Create new secret key

Save key in `.env` file:

```env
WHISPER_API_KEY="your-key-here"
```

**API Configuration:**

The `transcribe_chunk` function does:

* **API Endpoint and Headers:** Sets the API URL and creates the authorization headers using your `API_KEY` (loaded from the environment)

* **Prepares Request Data:**

  * `with open(chunk_path, "rb") as audio_file:` Opens the audio chunk file in binary read mode
  * `files = {"file": audio_file}` Prepares the file for upload in the request
  * `data = { ... }` Creates a dictionary containing the request parameters

    * `"model": "whisper-1"` Specifies the Whisper model to use
    * `"language": "sl"` Sets the language to Slovenian (`sl`). Change this to the correct language code for your audio (e.g. `en` for English)
    * `"response_format": "verbose_json"` Requests the detailed JSON response format
    * `"timestamp_granularities[]": "word"` Requests word-level timestamps

* **Makes the API Request:**

  * `response = requests.post(url, headers=headers, files=files, data=data)` Sends a POST request to the API with the headers, file, and data
  * `response.raise_for_status()` Checks for HTTP errors. If an error occurred, this line will raise an exception, stopping the script

---

### 3. Progress Tracking System

This system is crucial for handling long audio files and potential interruptions. It allows the script to resume processing from where it left off.

We implemented a robust progress tracking system.

**Progress File Structure:**

This JSON structure stores the transcription results for each processed chunk. The keys are the chunk filenames, and the values are dictionaries containing the transcribed text and word-level timestamps.

**Progress Loading:**

The `load_progress` function:

* Checks for existing file using `os.path.exists(PROGRESS_FILE)`
* Loads progress using `json.load(f)` if it exists
* Initializes a new progress structure if it does not

**Completion Tracking:**

The `mark_completed` function:

* Loads current progress
* Sets the `completed` key to `True`
* Saves the updated progress using `json.dump()`

---

### 4. Timestamp Management

The crucial part was maintaining correct timestamps across chunks.

Since the audio is split into chunks, the timestamps returned by Whisper are relative to the beginning of each chunk. This section shows how to adjust these timestamps to be relative to the beginning of the original audio file.

The `merge_transcriptions` function:

* Initializes variables for merged text, all words, and time offset
* Iterates through chunks in order
* Appends transcribed text
* Adjusts word-level timestamps by adding the current time offset
* Updates the offset based on the last word’s end time

---

### 5. Time Interval Processing

This code takes the merged, timestamp-adjusted words and groups them into user-defined time intervals (e.g. 1-minute intervals). This makes the transcript easier to navigate.

We added time-based grouping of transcriptions.

This function is a trimmed down version of `merge_transcriptions`. It performs many of the same actions, except it returns a list of words, instead of returning the merged text.

The `parse_transcription` function likely includes additional logic:

* Loads progress data
* Initializes variables
* Sorts chunks by number
* Adjusts timestamps
* Groups words into intervals based on start time
* Returns the grouped structure

---

### 6. Output Generation

The tool generates different output formats:

* Raw Text
* Timestamped JSON
* Time-Interval Text

---

### 7. Error Recovery System

We implemented several error recovery mechanisms:

* Chunk processing recovery
* Temporary file management

This technical breakdown shows how each component works together to create a reliable transcription system that can handle files of any size while maintaining accurate timestamps and providing recovery options.

---

## 5. Using the Tool

### 1. Prepare Your Audio File

Supported formats:

* mp3
* mp4
* mpeg
* mpga
* m4a
* wav
* webm

No size limitation (automatically splits files)

---

### 2. Run the Transcription

```bash
python main.py
```

This command starts the transcription process. Make sure you are in the project directory (`long-audio-transcriber`) and your virtual environment is activated before running this.

The script will:

* Split large files if needed
* Process each chunk
* Save progress after each chunk
* Merge results with correct timestamps

---

### 3. Process Time Intervals

This step is optional but very useful. It runs a separate script (`process_transcription.py`) that implements the `parse_transcription` function and the interval grouping logic.

---

## 6. Output Files

The tool generates several output files:

* `transcription.txt`: Raw transcription text
* `transcription_timestamps.json`: JSON with word-level timestamps
* `transcription_by_intervals.txt`: Text grouped by time intervals
* `transcription_progress.json`: Progress tracking file

---

## Advanced Features

### Progress Tracking

The tool maintains a progress file that allows you to resume interrupted transcriptions.

### Time Interval Processing

Transcriptions are grouped into configurable time intervals.

### Error Handling

The tool includes robust error handling:

* Saves progress after each chunk
* Maintains temporary files for resume capability
* Validates input files and API responses

---

## Conclusion

This tool makes it practical to transcribe long audio files using OpenAI’s Whisper API. It handles the complexities of file splitting, progress tracking, and timestamp management, allowing you to focus on using the transcriptions rather than managing the technical details.

The complete code is available on GitHub: [long-audio-transcriber](https://github.com/matija2209/long-audio-transcriber)