Analysis 9 min read

Your Meeting Data Is the Next AI Training Goldmine

Every AI notetaker has thousands of hours of your company's conversations. Who owns that data, who is training on it, and what the privacy policies actually say.

meetingstack research · 9 min read

A 50-person company running three meetings per day generates roughly 300 hours of recorded audio per month. That is 300 hours of unfiltered conversation about product strategy, deal pricing, employee performance, legal exposure, and competitive intelligence. Now multiply that by the millions of companies feeding their meetings into AI notetakers. The scale of sensitive data flowing through these tools is staggering, and most teams have never read the privacy policy.

We read all six privacy policies so you do not have to. What we found ranges from responsible to alarming.

Meeting recordings are not like email or Slack messages. They capture tone, hesitation, who talks over whom, who stays silent. They contain the things people say when they think only their colleagues are listening. For AI companies, this data is extraordinarily valuable. For the companies producing it, the risk is just as large.

What your notetaker knows about you

The transcript is the obvious output. But the raw data captured by AI notetakers goes far deeper than words on a page.

Voice biometrics. Every recording contains a unique voiceprint for each speaker. Voice data can identify individuals across meetings, even across different tools. Once captured, it cannot be changed like a password.

Organizational maps. Meeting metadata reveals who talks to whom, how often, and for how long. Over weeks, a notetaker builds a precise map of your org chart, decision-making hierarchy, and internal alliances. No employee directory is this accurate.

Deal intelligence. Sales calls contain pricing, discount thresholds, competitor comparisons, and objection patterns. A single quarter of recorded sales meetings is a complete playbook for how your company sells. As we explored in our Gong Tax breakdown, revenue intelligence tools ingest this data at scale.

Product roadmaps. Internal planning meetings expose upcoming features, timelines, technical debt, and strategic priorities. This is the information competitors pay consultants to uncover.

HR and legal exposure. Performance reviews, disciplinary discussions, termination meetings, legal strategy sessions. These get recorded too, sometimes intentionally, sometimes because someone forgot to pause the bot. The liability is enormous.

This is not just text. It is a living, searchable map of how your company operates, makes decisions, and manages its people. Every AI notetaker that processes your meetings holds a copy.

What the privacy policies actually say

We read the privacy policies and security documentation of six major AI meeting tools. The differences are significant. Here is what we found:

Tool	Trains on data?	Opt-out?	Data residency	Retention	SOC 2	HIPAA
Otter.ai	Yes (de-identified)	Requires explicit opt-in via ratings	US (AWS)	Retained after deletion for "business purposes"	Type II	Enterprise only (BAA)
Fireflies.ai	No	N/A (no training)	US/EU; BYOS on Enterprise	Zero retention by subprocessors	Type II	Yes (BAA)
Fathom	De-identified*	Yes (user and admin level)	US	User-controlled deletion	Type II	Yes
tl;dv	No	N/A (no training)	EU or US (user choice)	User-controlled	Type I	No
Gong	No	N/A (no training)	US/EU/AU	Deleted 30 days after contract end	Type II	Yes (BAA)
Granola	Yes (de-identified)	Enterprise only	US (AWS)	Not specified	Type II	On roadmap

Green = good for users. Red = concern. Amber = conditional or limited. *Fathom uses de-identified data for model improvement; users and admins can opt out at any time.

Based on our review of publicly available privacy policies as of March 2026. Policies change frequently. Verify directly with each vendor before making decisions.

A few things stand out. Otter.ai trains its proprietary models on "de-identified" audio and transcripts. Their policy states this happens automatically unless users opt out, and the company retains data even after deletion for "legitimate business purposes." Otter has faced legal scrutiny over its data collection practices, with users raising concerns about recording consent and data usage.

Fireflies takes the opposite approach. Their policy explicitly states that meeting content is never used to train AI models, internal or external. Subprocessors operate under zero-retention agreements. Enterprise customers can bring their own storage on AWS S3 or Google Cloud.

Gong, one of the largest players in revenue intelligence, states plainly that customer data is never used to train generative models. They route AI processing through a private Azure OpenAI tenant rather than public APIs.

Granola is the newest and most concerning entry. The tool trains on meeting data by default. The opt-out exists only for Enterprise customers. Reports have surfaced that Granola's shared meeting notes may be accessible to anyone with the link by default. If accurate, this raises concerns for teams sharing sensitive meeting content. For a tool that processes sensitive conversations, that raises questions about default privacy settings.

Estimated meeting audio ingested per month

Rough estimates based on publicly available user counts. Actual volumes may differ significantly.

The volume matters. Otter, one of the largest players in the space with millions of reported users, likely processes more hours of business conversation per month than most podcast networks produce in a year. Even if only a fraction of that data touches model training pipelines, the corpus is massive.

The training data question

The central question is simple: is your meeting data training someone else's AI?

Most enterprise-tier plans say no. Gong, Fireflies, and tl;dv all state explicitly that customer data does not enter training pipelines. Their contracts back this up with data processing agreements that prohibit it.

Free tiers are a different story. When a tool costs nothing, the business model has to come from somewhere. Otter's free plan processes your audio through training pipelines using "de-identified" data. Granola trains on conversations by default unless you pay for Enterprise and manually opt out. The word "de-identified" does heavy lifting in these policies, and privacy researchers have raised questions about whether audio data can truly be de-identified when it contains voiceprints.

There is also the subprocessor question. Even tools that do not train their own models send your audio to third-party transcription and LLM providers. Fireflies routes through OpenAI with a BAA and zero-retention agreement. According to tl;dv's documentation, tl;dv anonymizes metadata and chunks meetings into randomized segments before sending to Anthropic, so no single provider sees a complete conversation. Gong avoids public LLMs entirely, using a private Azure OpenAI instance.

But not every vendor is this careful. Smaller tools may use default API configurations where the LLM provider retains inputs for 30 days or uses them for abuse monitoring. Unless the notetaker vendor has negotiated specific terms, your meeting data may sit on OpenAI or Anthropic servers longer than you expect.

The risk compounds over time. A single meeting transcript is moderately sensitive. A year of transcripts from every meeting in your company is a complete intelligence file. The longer data is retained, and the more broadly it is shared, the larger the attack surface becomes.

Before adopting any AI notetaker, your security team should get clear answers to these questions:

Does any of our meeting data (audio, transcripts, metadata) enter training pipelines? "De-identified" is not the same as "no." Press for specifics on what de-identification means and whether it applies to audio or only text.
Which subprocessors handle our data, and what are their retention terms? A tool may not train on your data, but its transcription provider might retain it. Ask for the subprocessor list and the DPA for each.
Where is our data stored, and can we choose the region? If you operate under GDPR, data residency is not optional. Some tools offer EU hosting; others store everything in US-East regardless.
What happens to our data when we cancel? Gong deletes within 30 days. Otter's policy allows indefinite retention. The difference matters.
Can we get a BAA for HIPAA compliance? If your organization handles any health-related discussions (benefits, insurance, patient data), you need this. Not every tool offers it.
Is there an audit log for who accessed our recordings? If an employee shares a recording externally, or if a vendor engineer accesses it during a support ticket, you should know.
What is the incident response plan if our data is breached? SOC 2 certification means a company has controls in place. It does not guarantee those controls will hold. Ask about breach notification timelines and past incidents.
Can your vendor reconstruct who attended which meetings with whom, even after you delete transcripts? Metadata (participant lists, timestamps, calendar links) often persists long after content is removed. This is the data that maps your organization.

If your vendor cannot answer these questions clearly, in writing, that tells you something.

Who gets this right

Several tools stand out for handling data responsibly.

Fathom has built its brand on privacy. Subprocessors are contractually prohibited from training on customer data. Users and admins can opt out of Fathom's own de-identified training. The tool holds SOC 2 Type II and HIPAA certifications. For a free notetaker, the privacy posture is unusually strong. The caveat: Fathom's free tier lacks CRM integrations that mid-market teams need, so organizations requiring Salesforce or HubSpot sync will need the paid team plan to get full value.

Fireflies goes further on the infrastructure side. Enterprise customers get dedicated, isolated storage with bring-your-own-storage options. Subprocessors operate under zero-retention agreements. The company holds SOC 2 Type II, GDPR, and HIPAA certifications with BAAs.

Gong takes the most conservative approach to AI processing. Customer data never touches public LLMs. Everything routes through a private Azure tenant. No training, no exceptions, regardless of plan tier. For companies processing high-stakes sales conversations, this matters.

Bluedot sidesteps the problem architecturally. It records via a Chrome extension, capturing audio locally on the user's machine rather than joining as a bot participant. The audio never passes through a meeting platform's infrastructure, which reduces the attack surface. GDPR-compliant, with European data hosting available.

Krisp takes the most radical approach: on-device processing. Audio stays on the user's machine. The AI runs locally. Nothing is uploaded to a cloud server for transcription. For organizations that cannot tolerate any data leaving their network, this is the only option that fully eliminates third-party risk.

On the other end of the spectrum, Otter and Granola both train on user data by default, with opt-outs that are either buried, limited to paid tiers, or both. Both situations raise questions about how these companies prioritize privacy in product design.

The pattern is clear. Tools that charge a meaningful price tend to have cleaner data practices. Tools that offer generous free tiers need to monetize the data somehow. Meeting recordings are too sensitive for that trade-off. If your notetaker is free, you should assume your conversations are the product until the privacy policy proves otherwise.

Your Meeting Data Is the Next AI Training Goldmine

What your notetaker knows about you

What the privacy policies actually say

The training data question

Who gets this right

More from meetingstack

Microsoft Teams is blocking third-party recording bots

Why mid-market teams are leaving Gong

The post-bot era of meeting recording

Your Meeting Data Is the Next AI Training Goldmine

What your notetaker knows about you

What the privacy policies actually say

The training data question

What to ask before you sign up

Who gets this right

More from meetingstack

Microsoft Teams is blocking third-party recording bots

Why mid-market teams are leaving Gong

The post-bot era of meeting recording