Long Audio Transcriber | Builds

What it is

Long Audio Transcriber is a small Python utility for transcribing long audio files with OpenAI's transcription API.

The core job is simple: point it at an audio file, let it split large files into safe chunks, transcribe each chunk, save progress, then merge everything back into a complete transcript.

It produces plain text, timestamped JSON, and an optional interval-based transcript that groups the text by time ranges.

What problem it solves

The annoying part of long audio transcription is not only speech-to-text.

The real friction is everything around it:

large files can exceed upload limits
long jobs can fail midway
restarting from zero wastes time and API cost
raw transcripts are hard to navigate without timestamps
a long wall of text is not useful when reviewing recordings

This project solves those practical workflow problems without adding a web interface or account system.

How it works

The tool reads configuration from environment variables, including the audio path, API key, and maximum chunk size.

If the audio file is larger than the configured limit, it uses ffmpeg to split it into smaller WAV chunks. Each chunk is sent to OpenAI's audio transcription endpoint. After every successful chunk, the result is written to transcription_progress.json.

That progress file is the resume mechanism. If the process is interrupted, already processed chunks can be reused instead of transcribed again.

Once all chunks are processed, the tool merges the transcription output into:

transcription.txt
transcription_timestamps.json
transcription_progress.json

A separate processing script can then group timestamped words into time intervals and write transcription_by_intervals.txt.

Features

Transcribes audio files locally through a Python script
Automatically splits files that exceed the configured size limit
Saves progress after each processed chunk
Resumes interrupted transcriptions from the progress file
Generates raw text output
Generates timestamped JSON output
Groups transcript text into configurable time intervals
Supports common audio formats documented in the README: mp3, mp4, mpeg, mpga, m4a, wav, and webm
Can run directly on a machine or through Docker Compose

Technical notes

This is a utility script, not a SaaS product.

There is no documented hosted app, dashboard, authentication layer, billing, or public demo. The README gives local and Docker-based setup instructions.

The implementation depends on ffmpeg for audio probing and chunking, and uses requests to call the OpenAI audio transcription API directly.

What exists today

The repository contains:

README.md with setup, usage, outputs, configuration, and error-handling notes
main.py for chunking, transcription, progress tracking, merging, and output writing
process_transcription.py for grouping transcript text into intervals
requirements.txt with Python dependencies
docker-compose.yml for running the tool in a Python container with ffmpeg
gen_dot_env.sample.sh for generating a local .env

What does not exist yet

There is no documented web UI.

There is no hosted demo.

There is no package release.

There is no documented pricing or revenue model.

There are no documented users, customers, or production metrics.

There is no license file in the repo, so the code is public, but the reuse terms are not defined in the repository.