Open source

Transcription Benchmark

An open-source tool for benchmarking transcription APIs against meeting audio. The same methodology behind every accuracy claim on this site, published for anyone to verify or run themselves.

Why meeting audio needs its own benchmark

Standard speech benchmarks (LibriSpeech, Common Voice) test clean audio: audiobooks, read sentences, podcast-quality recordings. Meetings are different. They have crosstalk, laptop microphones, screen-share audio bleeding through, engineers reading out variable names, and people switching languages mid-sentence.

If you're choosing a transcription API for meeting recordings, you need a benchmark that tests meeting conditions. Generic WER scores don't transfer.

What it measures

Word Error Rate (WER)

How many words the service got wrong vs. a human-verified transcript. Lower is better. Normalized for punctuation and casing.

Speaker Diarization

Did it correctly identify who said what? Measured as time-weighted accuracy with automatic label mapping.

Latency

Wall-clock time from upload to transcript. Includes API overhead, processing, and any polling.

Cost per Hour

What you'll actually pay at current API pricing. Updated quarterly.

Latest results

Sample output from the benchmark CLI. Results will vary by audio sample and API version.

Service WER Diarization Latency Cost/hr
Deepgram Nova-2 4.2% 91.3% 1.2s $0.22
Rev AI 4.9% 90.1% 5.1s $0.28
AssemblyAI 5.1% 88.7% 4.8s $0.30
OpenAI Whisper 6.8% N/A 3.4s $0.36

Whisper API does not support speaker diarization. Results from two-speaker standup sample, 26s audio.

Run it yourself

# Install
pip install meetingstack-bench
# Set API keys
export DEEPGRAM_API_KEY=your-key
export ASSEMBLYAI_API_KEY=your-key
# Run against all configured services
msb run
# Or pick specific ones
msb run -a deepgram -a assemblyai
# Output as markdown
msb run -o markdown -f results.md

Test samples

The benchmark ships with scripted meeting recordings paired with human-verified transcripts. Each sample is tagged with speaker count, noise level, and meeting type.

two-speaker-standup

Engineering standup, 2 speakers, clean audio. Covers technical vocabulary, short turns.

three-speaker-sales

Sales demo call, 3 speakers, mixed audio. Covers product terminology, compliance questions.

You can bring your own audio. Place a WAV file and a ground-truth transcript JSON in a directory and point the CLI at it.

Supported services

Four adapters ship with v0.1. Adding a new one is a single Python file that normalizes API output into a common transcript format.

Deepgram
Nova-2
AssemblyAI
Universal-2
OpenAI
Whisper
Rev AI
Speech-to-text v1
The accuracy numbers published on meetingstack.io are produced using this tool. Open-sourcing the methodology means you can verify our claims, reproduce our tests, or run the benchmark against your own audio.