Open source

Transcription Benchmark

An open-source tool for benchmarking transcription APIs against meeting audio. The same methodology behind every accuracy claim on this site, published for anyone to verify or run themselves.

Why meeting audio needs its own benchmark

Standard speech benchmarks (LibriSpeech, Common Voice) test clean audio: audiobooks, read sentences, podcast-quality recordings. Meetings are different. They have crosstalk, laptop microphones, screen-share audio bleeding through, engineers reading out variable names, and people switching languages mid-sentence.

If you're choosing a transcription API for meeting recordings, you need a benchmark that tests meeting conditions. Generic WER scores don't transfer.

What it measures

Word Error Rate (WER)

How many words the service got wrong vs. a human-verified transcript. Lower is better. Normalized for punctuation and casing.

Speaker Diarization

Did it correctly identify who said what? Measured as time-weighted accuracy with automatic label mapping.

Latency

Wall-clock time from upload to transcript. Includes API overhead, processing, and any polling.

Cost per Hour

What you'll actually pay at current API pricing. Updated quarterly.

Latest results

Sample output from the benchmark CLI. Results will vary by audio sample and API version.

Service	WER	Diarization	Latency	Cost/hr
Deepgram Nova-2	4.2%	91.3%	1.2s	$0.22
Rev AI	4.9%	90.1%	5.1s	$0.28
AssemblyAI	5.1%	88.7%	4.8s	$0.30
OpenAI Whisper	6.8%	N/A	3.4s	$0.36

Whisper API does not support speaker diarization. Results from two-speaker standup sample, 26s audio.

Run it yourself

# Install

pip install meetingstack-bench

# Set API keys

export DEEPGRAM_API_KEY=your-key

export ASSEMBLYAI_API_KEY=your-key

# Run against all configured services

msb run

# Or pick specific ones

msb run -a deepgram -a assemblyai

# Output as markdown

msb run -o markdown -f results.md

Test samples

The benchmark ships with scripted meeting recordings paired with human-verified transcripts. Each sample is tagged with speaker count, noise level, and meeting type.

two-speaker-standup

Engineering standup, 2 speakers, clean audio. Covers technical vocabulary, short turns.

three-speaker-sales

Sales demo call, 3 speakers, mixed audio. Covers product terminology, compliance questions.

You can bring your own audio. Place a WAV file and a ground-truth transcript JSON in a directory and point the CLI at it.

Supported services

Four adapters ship with v0.1. Adding a new one is a single Python file that normalizes API output into a common transcript format.

Deepgram

Nova-2

AssemblyAI

Universal-2

OpenAI

Whisper

Rev AI

Speech-to-text v1

View on GitHub Read our transcription reviews

The accuracy numbers published on meetingstack.io are produced using this tool. Open-sourcing the methodology means you can verify our claims, reproduce our tests, or run the benchmark against your own audio.