How we test and review
Every review, comparison, and benchmark on meetingstack follows a consistent process. This page explains what we measure, how we measure it, and where our limitations are.
Evaluation process
Each tool gets hands-on testing with real meeting recordings across multiple use cases: sales calls, engineering standups, client workshops, and group planning sessions. We test with audio that includes accents, crosstalk, background noise, and technical jargon because that's what real meetings sound like.
We sign up for actual accounts, use actual free tiers and trials, and verify pricing at checkout. If a vendor lists one price on their marketing page and charges a different amount at signup, we report the real number.
What we measure
Transcription accuracy
Word Error Rate (WER) measured against human-verified ground truth transcripts. We test across five audio conditions: clean single speaker, dialogue, crosstalk, non-native accents, and background noise. See our open-source benchmark
Speaker diarization
Percentage of audio time where the provider correctly identifies who is speaking. Tested on multi-speaker recordings with 2-6 participants.
Integration depth
We test actual integrations with Zoom, Teams, Google Meet, Slack, CRMs, and project management tools. Not just "does it connect" but "does the data flow correctly and is it useful on the other side."
Pricing verification
We verify pricing by going through the actual signup flow. Published prices are checked quarterly. If pricing changes, we update the review within one business day of confirming the change.
User experience
Setup time, onboarding flow, daily workflow friction, and how the tool behaves when things go wrong (network drops, large meetings, edge cases). We note bot intrusiveness for tools that join meetings as a visible participant.
How rankings work
Rankings within each category use a weighted score across four dimensions:
Weights shift by category. For transcription APIs, accuracy gets 45% and UX drops to 5%. For scheduling tools, UX gets 30%. We publish the weights used on each category page.
How comparisons work
Head-to-head comparisons test both tools under identical conditions. Same meeting recordings, same test accounts, same evaluation criteria. We report strengths and weaknesses for both sides. There is always a verdict, but we don't declare "winners" because the best tool depends on your specific needs.
Limitations and caveats
- We test with English audio only unless stated otherwise. Multilingual performance may differ.
- We use default settings for all providers. Custom vocabulary, fine-tuning, and language hints can improve results.
- Enterprise features that require custom contracts are noted but not always tested hands-on.
- Our test environment (audio quality, meeting size, use cases) may not match yours. Use our data as a starting point, not the final answer.
- Tools update constantly. We re-verify reviews quarterly, but features can change between cycles.
Independence
Some links on this site are affiliate links (always labeled). Affiliate relationships never influence rankings, scores, or editorial judgment. No vendor gets advance review of our content. If a vendor offers us early access to a feature for testing, we accept it but disclose it in the review.
If we get something wrong, email hello@meetingstack.io. We correct errors openly and note the correction date.
Open-source benchmarks
Our transcription accuracy benchmark is open source. You can run the same tests yourself, add providers, or contribute audio samples.