What it is
Long Audio Transcriber is a small Python utility for transcribing long audio files with OpenAI's transcription API.
The core job is simple: point it at an audio file, let it split large files into safe chunks, transcribe each chunk, save progress, then merge everything back into a complete transcript.
It produces plain text, timestamped JSON, and an optional interval-based transcript that groups the text by time ranges.
What problem it solves
The annoying part of long audio transcription is not only speech-to-text.
The real friction is everything around it:
- large files can exceed upload limits
- long jobs can fail midway
- restarting from zero wastes time and API cost
- raw transcripts are hard to navigate without timestamps
- a long wall of text is not useful when reviewing recordings
This project solves those practical workflow problems without adding a web interface or account system.
How it works
The tool reads configuration from environment variables, including the audio path, API key, and maximum chunk size.
If the audio file is larger than the configured limit, it uses ffmpeg to split it into smaller WAV chunks. Each chunk is sent to OpenAI's audio transcription endpoint. After every successful chunk, the result is written to transcription_progress.json.
That progress file is the resume mechanism. If the process is interrupted, already processed chunks can be reused instead of transcribed again.
Once all chunks are processed, the tool merges the transcription output into:
transcription.txttranscription_timestamps.jsontranscription_progress.json
A separate processing script can then group timestamped words into time intervals and write transcription_by_intervals.txt.
Features
- Transcribes audio files locally through a Python script
- Automatically splits files that exceed the configured size limit
- Saves progress after each processed chunk
- Resumes interrupted transcriptions from the progress file
- Generates raw text output
- Generates timestamped JSON output
- Groups transcript text into configurable time intervals
- Supports common audio formats documented in the README: mp3, mp4, mpeg, mpga, m4a, wav, and webm
- Can run directly on a machine or through Docker Compose
Technical notes
This is a utility script, not a SaaS product.
There is no documented hosted app, dashboard, authentication layer, billing, or public demo. The README gives local and Docker-based setup instructions.
The implementation depends on ffmpeg for audio probing and chunking, and uses requests to call the OpenAI audio transcription API directly.
What exists today
The repository contains:
README.mdwith setup, usage, outputs, configuration, and error-handling notesmain.pyfor chunking, transcription, progress tracking, merging, and output writingprocess_transcription.pyfor grouping transcript text into intervalsrequirements.txtwith Python dependenciesdocker-compose.ymlfor running the tool in a Python container with ffmpeggen_dot_env.sample.shfor generating a local.env
What does not exist yet
There is no documented web UI.
There is no hosted demo.
There is no package release.
There is no documented pricing or revenue model.
There are no documented users, customers, or production metrics.
There is no license file in the repo, so the code is public, but the reuse terms are not defined in the repository.