How we test and review

Every review, comparison, and benchmark on meetingstack follows a consistent process. This page explains what we measure, how we measure it, and where our limitations are.

Evaluation process

Each tool gets hands-on testing with real meeting recordings across multiple use cases: sales calls, engineering standups, client workshops, and group planning sessions. We test with audio that includes accents, crosstalk, background noise, and technical jargon because that's what real meetings sound like.

We sign up for actual accounts, use actual free tiers and trials, and verify pricing at checkout. If a vendor lists one price on their marketing page and charges a different amount at signup, we report the real number.

What we measure

Transcription accuracy

Word Error Rate (WER) measured against human-verified ground truth transcripts. We test across five audio conditions: clean single speaker, dialogue, crosstalk, non-native accents, and background noise. See our open-source benchmark

Speaker diarization

Percentage of audio time where the provider correctly identifies who is speaking. Tested on multi-speaker recordings with 2-6 participants.

Integration depth

We test actual integrations with Zoom, Teams, Google Meet, Slack, CRMs, and project management tools. Not just "does it connect" but "does the data flow correctly and is it useful on the other side."

Pricing verification

We verify pricing by going through the actual signup flow. Published prices are checked quarterly. If pricing changes, we update the review within one business day of confirming the change.

User experience

Setup time, onboarding flow, daily workflow friction, and how the tool behaves when things go wrong (network drops, large meetings, edge cases). We note bot intrusiveness for tools that join meetings as a visible participant.

How rankings work

Rankings within each category use a weighted score across four dimensions:

35%

Accuracy / core function

25%

Pricing / value

25%

Integrations

15%

User experience

Weights shift by category. For transcription APIs, accuracy gets 45% and UX drops to 5%. For scheduling tools, UX gets 30%. We publish the weights used on each category page.

How comparisons work

Head-to-head comparisons test both tools under identical conditions. Same meeting recordings, same test accounts, same evaluation criteria. We report strengths and weaknesses for both sides. There is always a verdict, but we don't declare "winners" because the best tool depends on your specific needs.

Limitations and caveats

We test with English audio only unless stated otherwise. Multilingual performance may differ.
We use default settings for all providers. Custom vocabulary, fine-tuning, and language hints can improve results.
Enterprise features that require custom contracts are noted but not always tested hands-on.
Our test environment (audio quality, meeting size, use cases) may not match yours. Use our data as a starting point, not the final answer.
Tools update constantly. We re-verify reviews quarterly, but features can change between cycles.

Independence

Some links on this site are affiliate links (always labeled). Affiliate relationships never influence rankings, scores, or editorial judgment. No vendor gets advance review of our content. If a vendor offers us early access to a feature for testing, we accept it but disclose it in the review.

If we get something wrong, email hello@meetingstack.io. We correct errors openly and note the correction date.

Open-source benchmarks

Our transcription accuracy benchmark is open source. You can run the same tests yourself, add providers, or contribute audio samples.

View benchmark results Read our research