Your Meeting Data Is the Next AI Training Goldmine
Every AI notetaker has thousands of hours of your company's conversations. Who owns that data, who is training on it, and what the privacy policies actually say.
A 50-person company running three meetings per day generates roughly 300 hours of recorded audio per month. That is 300 hours of unfiltered conversation about product strategy, deal pricing, employee performance, legal exposure, and competitive intelligence. Now multiply that by the millions of companies feeding their meetings into AI notetakers. The scale of sensitive data flowing through these tools is staggering, and most teams have never read the privacy policy.
We read all six privacy policies so you do not have to. What we found ranges from responsible to alarming.
Meeting recordings are not like email or Slack messages. They capture tone, hesitation, who talks over whom, who stays silent. They contain the things people say when they think only their colleagues are listening. For AI companies, this data is extraordinarily valuable. For the companies producing it, the risk is just as large.
What your notetaker knows about you
The transcript is the obvious output. But the raw data captured by AI notetakers goes far deeper than words on a page.
Voice biometrics. Every recording contains a unique voiceprint for each speaker. Voice data can identify individuals across meetings, even across different tools. Once captured, it cannot be changed like a password.
Organizational maps. Meeting metadata reveals who talks to whom, how often, and for how long. Over weeks, a notetaker builds a precise map of your org chart, decision-making hierarchy, and internal alliances. No employee directory is this accurate.
Deal intelligence. Sales calls contain pricing, discount thresholds, competitor comparisons, and objection patterns. A single quarter of recorded sales meetings is a complete playbook for how your company sells. As we explored in our Gong Tax breakdown, revenue intelligence tools ingest this data at scale.
Product roadmaps. Internal planning meetings expose upcoming features, timelines, technical debt, and strategic priorities. This is the information competitors pay consultants to uncover.
HR and legal exposure. Performance reviews, disciplinary discussions, termination meetings, legal strategy sessions. These get recorded too, sometimes intentionally, sometimes because someone forgot to pause the bot. The liability is enormous.
This is not just text. It is a living, searchable map of how your company operates, makes decisions, and manages its people. Every AI notetaker that processes your meetings holds a copy.
What the privacy policies actually say
We read the privacy policies and security documentation of six major AI meeting tools. The differences are significant. Here is what we found:
| Tool | Trains on data? | Opt-out? | Data residency | Retention | SOC 2 | HIPAA |
|---|---|---|---|---|---|---|
| Otter.ai | Yes (de-identified) | Requires explicit opt-in via ratings | US (AWS) | Retained after deletion for "business purposes" | Type II | Enterprise only (BAA) |
| Fireflies.ai | No | N/A (no training) | US/EU; BYOS on Enterprise | Zero retention by subprocessors | Type II | Yes (BAA) |
| Fathom | De-identified* | Yes (user and admin level) | US | User-controlled deletion | Type II | Yes |
| tl;dv | No | N/A (no training) | EU or US (user choice) | User-controlled | Type I | No |
| Gong | No | N/A (no training) | US/EU/AU | Deleted 30 days after contract end | Type II | Yes (BAA) |
| Granola | Yes (de-identified) | Enterprise only | US (AWS) | Not specified | Type II | On roadmap |
A few things stand out. Otter.ai trains its proprietary models on "de-identified" audio and transcripts. Their policy states this happens automatically unless users opt out, and the company retains data even after deletion for "legitimate business purposes." A 2025 class-action lawsuit alleged that Otter recorded private conversations and used them for model training without adequate consent. The case brought national attention to how broadly some notetaker vendors interpret their data rights.
Fireflies takes the opposite approach. Their policy explicitly states that meeting content is never used to train AI models, internal or external. Subprocessors operate under zero-retention agreements. Enterprise customers can bring their own storage on AWS S3 or Google Cloud.
Gong, despite being a $7B+ company processing millions of sales calls, states plainly that customer data is never used to train generative models. They route AI processing through a private Azure OpenAI tenant rather than public APIs.
Granola is the newest and most concerning entry. The tool trains on meeting data by default. The opt-out exists only for Enterprise customers. Worse, a March 2026 investigation revealed that Granola's shared meeting notes were accessible to anyone with the link, set to public by default. For a tool that processes sensitive conversations, that is a serious design failure.
The volume matters. Otter alone, with over 20 million users and $100M ARR as of late 2025, likely processes more hours of business conversation per month than most podcast networks produce in a year. Even if only a fraction of that data touches model training pipelines, the corpus is massive.
The training data question
The central question is simple: is your meeting data training someone else's AI?
Most enterprise-tier plans say no. Gong, Fireflies, and tl;dv all state explicitly that customer data does not enter training pipelines. Their contracts back this up with data processing agreements that prohibit it.
Free tiers are a different story. When a tool costs nothing, the business model has to come from somewhere. Otter's free plan processes your audio through training pipelines using "de-identified" data. Granola trains on conversations by default unless you pay for Enterprise and manually opt out. The word "de-identified" does heavy lifting in these policies, and regulators have started questioning whether audio data can truly be de-identified when it contains voiceprints.
There is also the subprocessor question. Even tools that do not train their own models send your audio to third-party transcription and LLM providers. Fireflies routes through OpenAI with a BAA and zero-retention agreement. tl;dv anonymizes metadata and chunks meetings into randomized segments before sending to Anthropic, so no single provider sees a complete conversation. Gong avoids public LLMs entirely, using a private Azure OpenAI instance.
But not every vendor is this careful. Smaller tools may use default API configurations where the LLM provider retains inputs for 30 days or uses them for abuse monitoring. Unless the notetaker vendor has negotiated specific terms, your meeting data may sit on OpenAI or Anthropic servers longer than you expect.
The risk compounds over time. A single meeting transcript is moderately sensitive. A year of transcripts from every meeting in your company is a complete intelligence file. The longer data is retained, and the more broadly it is shared, the larger the attack surface becomes.
What to ask before you sign up
Before adopting any AI notetaker, your security team should get clear answers to these questions:
- Does any of our meeting data (audio, transcripts, metadata) enter training pipelines? "De-identified" is not the same as "no." Press for specifics on what de-identification means and whether it applies to audio or only text.
- Which subprocessors handle our data, and what are their retention terms? A tool may not train on your data, but its transcription provider might retain it. Ask for the subprocessor list and the DPA for each.
- Where is our data stored, and can we choose the region? If you operate under GDPR, data residency is not optional. Some tools offer EU hosting; others store everything in US-East regardless.
- What happens to our data when we cancel? Gong deletes within 30 days. Otter's policy allows indefinite retention. The difference matters.
- Can we get a BAA for HIPAA compliance? If your organization handles any health-related discussions (benefits, insurance, patient data), you need this. Not every tool offers it.
- Is there an audit log for who accessed our recordings? If an employee shares a recording externally, or if a vendor engineer accesses it during a support ticket, you should know.
- What is the incident response plan if our data is breached? SOC 2 certification means a company has controls in place. It does not guarantee those controls will hold. Ask about breach notification timelines and past incidents.
- Can your vendor reconstruct who attended which meetings with whom, even after you delete transcripts? Metadata (participant lists, timestamps, calendar links) often persists long after content is removed. This is the data that maps your organization.
If your vendor cannot answer these questions clearly, in writing, that tells you something.
Who gets this right
Several tools stand out for handling data responsibly.
Fathom has built its brand on privacy. Subprocessors are contractually prohibited from training on customer data. Users and admins can opt out of Fathom's own de-identified training. The tool holds SOC 2 Type II and HIPAA certifications. For a free notetaker, the privacy posture is unusually strong. The caveat: Fathom's free tier lacks CRM integrations that mid-market teams need, so organizations requiring Salesforce or HubSpot sync will need the paid team plan to get full value.
Fireflies goes further on the infrastructure side. Enterprise customers get dedicated, isolated storage with bring-your-own-storage options. Subprocessors operate under zero-retention agreements. The company holds SOC 2 Type II, GDPR, and HIPAA certifications with BAAs.
Gong takes the most conservative approach to AI processing. Customer data never touches public LLMs. Everything routes through a private Azure tenant. No training, no exceptions, regardless of plan tier. For companies processing high-stakes sales conversations, this matters.
Bluedot sidesteps the problem architecturally. It records via a Chrome extension, capturing audio locally on the user's machine rather than joining as a bot participant. The audio never passes through a meeting platform's infrastructure, which reduces the attack surface. GDPR-compliant, with European data hosting available.
Krisp takes the most radical approach: on-device processing. Audio stays on the user's machine. The AI runs locally. Nothing is uploaded to a cloud server for transcription. For organizations that cannot tolerate any data leaving their network, this is the only option that fully eliminates third-party risk.
On the other end of the spectrum, Otter and Granola both train on user data by default, with opt-outs that are either buried, limited to paid tiers, or both. Otter's ongoing class-action lawsuit and Granola's public-by-default sharing incident suggest that privacy is not the top engineering priority at either company.
The pattern is clear. Tools that charge a meaningful price tend to have cleaner data practices. Tools that offer generous free tiers need to monetize the data somehow. Meeting recordings are too sensitive for that trade-off. If your notetaker is free, you should assume your conversations are the product until the privacy policy proves otherwise.