AI Data Leakage: Seven Paths Through Enterprise AI

How Does Sensitive Data Leak Through Enterprise AI? Seven Common Paths

Sensitive data leaks through enterprise AI along seven distinct paths: prompts employees type, files they upload, the responses models return, retrieval from connected data sources, conversation memory the tool keeps, the tool calls agents make, and the code assistants generate. Most data protection programs watch exactly one of them, the prompt, while the other six move sensitive data without any deliberate user action.

That single-path design is why 86% of organizations report little or no visibility into the data flowing into and out of their AI tools (Kiteworks, 2025), even as 88% now use AI in at least one business function (McKinsey State of AI, November 2025). A framework that guards one of seven vectors is not a prevention program. It is a gap with a policy attached. This page maps each path to the control that blocks it and the regulation it triggers. For the broader problem framing, see AI data leakage.

Last updated: June 2026.

The Seven Paths Sensitive Data Travels Through Enterprise AI

Enterprise AI leaks data along seven paths, and only two of them, prompts and file uploads, involve a deliberate user action a security team would expect to monitor. The other five move data the user never knowingly submitted. Each path carries different data in a different way, and each needs its own visibility and control.

Path	How data leaves	Typical sensitive data
1. Prompts	An employee types or pastes content into an AI tool	Customer records, financials, credentials, plans
2. File uploads	A document, spreadsheet, or code file is attached for analysis	Contracts, PII, PHI, source code
3. Responses	The model returns content that contains or reconstructs sensitive data	Summaries of restricted records, leaked context
4. Retrieval and connectors	The tool pulls from connected data stores beyond the user’s need to know	Overshared files, cross-department data
5. Memory and retention	Conversation history is stored, reused across sessions, or retained by the provider	Anything entered earlier, now persisted
6. Tool calls and agent actions	An agent sends data to external systems through a tool or MCP call	Records sent to third parties, exfiltrated secrets
7. Generated code	An assistant embeds sensitive values in code that is then shared or committed	API keys, tokens, customer data in code

Definition, AI data leakage path. A leakage path is a distinct route by which sensitive data exits enterprise control through an AI interaction. The seven paths divide into two classes: deliberate submission (paths 1 and 2) and non-deliberate movement (paths 3 through 7). The distinction matters because prevention controls built for the first class cannot see the second.

Why Input-Only Controls Leave Six Paths Unguarded

Input-only controls leave six of seven paths unguarded because they inspect what the user submits and nothing else. 60% of organizations cannot detect the specific prompts employees send into generative AI tools (Cisco AI Readiness Index, 2025), so even the one path most programs are designed for goes largely unmonitored. The harder problem is the remaining five paths move data the user never deliberately submitted, and a tool that inspects only the input has no way to see them.

A control that watches the prompt sees the question and misses the answer, the retrieval, the memory, the tool call, and the code. Gartner predicts that through 2026, at least 80% of unauthorized AI transactions will come from internal policy violations such as oversharing, not malicious attacks (Gartner, 2025). The exposure is not an attacker problem first. It is an architecture problem: the controls were built for destinations, not for prompts, responses, and agent actions.

Closing the gap starts with discovering where AI runs at all, covered in AI discovery, then enforcing policy on every path, covered in AI usage control.

Responses and Retrieval Are the Outputs Security Tools Never Inspect

A model response carries sensitive data out just as easily as a prompt carries it in, and response visibility is missing by design in tools built around web sessions. The OWASP Top 10 for LLM Applications ranks Sensitive Information Disclosure as LLM02, its second-highest risk, because models surface data the user was not entitled to see (OWASP, 2025). Many security tools see the request and never inspect the streamed answer.

Retrieval compounds the blind spot. When an AI tool connects to enterprise data, it surfaces far more than the user intended. A copilot indexes whatever a user can technically reach, so a permissions structure that was tolerable when humans clicked through files becomes an exposure when AI summarizes across all of them in one prompt.

Definition, oversharing. Oversharing is the surfacing of data a user can technically access but has no business need to see, amplified when an AI assistant retrieves and synthesizes across an entire permission scope in a single response. It is the dominant failure mode in copilot rollouts.

Conversation memory extends the exposure window. Data entered once can persist across sessions or be retained by the provider, then move to the web, to third-party models, or to other agents (Aurascape, 2026). The fix is not only restricting what users type. It is classifying data wherever it surfaces and controlling where it can flow.

Agent Tool Calls Turn Reading Into Exfiltration

The agent tool call is the most consequential path, because an agent does not just read data, it acts on it, sending data to external systems through tools and MCP calls. The OWASP Top 10 for LLM Applications captures this as Excessive Agency, ranked LLM06, the risk that an agent takes actions beyond what was intended (OWASP, 2025). A reading problem becomes an exfiltration problem the moment the agent can invoke a tool.

Aurascape research makes the path concrete. In the SilentBridge class of zero-click attacks, a single benign request such as “summarize this” was enough to exfiltrate Gmail content and extract API keys and customer data through an agent’s connectors (Aurascape, 2026). The Cloud Security Alliance found 82% of organizations have unknown AI agents operating in their environment and 61% reported agent-related data exposure (Cloud Security Alliance, 2026), which means most enterprises cannot even inventory the agents that hold this path open.

Definition, indirect prompt injection. Indirect prompt injection plants malicious instructions inside third-party content an agent ingests, such as a web page, email, or document, so the agent executes attacker intent without the user typing anything. EchoLeak in Microsoft 365 Copilot (CVE-2025-32711) and ForcedLeak in Salesforce Agentforce (CVSS 9.4) are documented examples that exfiltrated data through trusted, allowlisted domains.

Aurascape’s Zero-Bypass MCP Gateway addresses this path at the point of execution, verifying, signing, and controlling every tool call, API invocation, and data retrieval before an agent reaches an external system (Aurascape, 2026). Unsigned calls are blocked, which closes the gap between an agent reading data and an agent sending it somewhere it should not go.

Generated Code Is the Seventh Path Text Controls Miss Entirely

Coding assistants open a seventh path that text-only controls ignore completely. A secret pasted in to debug an error, a customer record used as test data, or a credential the assistant suggests inline can end up embedded in generated code, then shared or committed to a repository. GitHub Copilot’s CVE-2025-53773 showed the inverse risk: hidden instructions in a README or code comment could silently enable an auto-approve mode that let the assistant run shell commands (NVD, 2025).

Because the leak is code rather than prose, classification has to understand code as a data type, not just documents and chat. A regex tuned for nine-digit Social Security numbers will not catch an API key formatted as a code constant. In one Aurascape deployment, a Fortune 100 insurer delivered code 40% faster with AI coding assistants while tripling agent integrations with no unauthorized data access (Aurascape, 2026). For the developer-specific treatment, see how to secure AI coding assistants.

A Five-Control Framework Closes All Seven Paths in Sequence

Closing all seven paths requires five technical controls deployed in sequence, each one blocking the paths the prior control cannot reach. The sequence runs discovery, classification, input enforcement, output and retrieval inspection, then agent tool-call governance. Skip any layer and a subset of the seven paths reopens.

Control	Paths it closes	What it does
1. Discovery	All seven (prerequisite)	Finds every AI app, agent, and MCP server in use, including shadow AI and embedded copilots
2. Classification	1, 2, 7	Tags sensitive data across text, code, images, voice, and video before it moves
3. Input enforcement	1, 2	Inspects prompts and uploads inline, enforcing policy on intent and data sensitivity
4. Output and retrieval inspection	3, 4, 5	Inspects model responses, connector retrieval, and memory, controlling where data flows
5. Agent tool-call governance	6	Verifies and signs every tool call before an agent reaches an external system

Discovery is the prerequisite because a path you cannot see is a path you cannot control. The Cloud Security Alliance found only 21% of organizations maintain a real-time inventory of active agents (Cloud Security Alliance, 2026), which means four in five are enforcing policy on an estate they cannot fully see. In one Aurascape deployment, a transportation company went from proof of value to full deployment in about six weeks, starting with 400 users on day one and expanding to 2,000, with sensitive-data interactions monitored across 100% of deployed users (Aurascape, 2026).

Classification and the two enforcement layers handle the data paths. Agent tool-call governance handles the action path. The framework is sequential because each control depends on the one before it: you cannot classify data in a tool you have not discovered, and you cannot govern a tool call you cannot inspect.

Control Priorities Shift by Industry and Regulatory Exposure

Control priorities shift by industry because the highest-value data and the governing regulation differ by sector. A bank prioritizes the regulatory audit trail; a healthcare enterprise prioritizes PHI classification across retrieval and response; a technology firm prioritizes generated-code and agent governance. The seven paths are universal, but the order in which a team hardens them follows where its regulated data concentrates.

Sector	Highest-risk paths	Priority control	Driving regulation
Banking	Responses, retrieval, memory	Examiner-ready interaction logs across every path	GLBA, FFIEC, NCUA
Insurance	Tool calls, generated code	Agent governance and code classification	State privacy law, GLBA
Healthcare	Retrieval, responses	PHI classification at response and connector	HIPAA
SLED	Prompts, retrieval	Discovery and sanctioned-access enforcement	State AI law, FERPA
Technology	Generated code, tool calls	Code classification and MCP gateway	Contractual, IP protection

In one Aurascape deployment, The Police Credit Union mapped controls to GLBA, FFIEC, NCUA, and the NIST AI RMF, with a projected 83% reduction in AI-based risk and a projected 27% productivity gain, generating examiner-ready interaction logs in the process (Aurascape, 2026). In a healthcare deployment governing more than 60,000 users worldwide, Aurascape drove unsanctioned and out-of-license AI access to near zero while minimizing sensitive-data exposure risk as AI use grew (Aurascape, 2026). The path order is the same; the resource allocation follows the regulated data.

Each Leakage Vector Triggers a Specific Regulation and Control

Each of the seven paths triggers a distinct regulatory obligation, and naming the control that satisfies it turns a threat taxonomy into a compliance case. NIST AI 600-1 Section 2.9 names both direct and indirect prompt injection as security risks specific to generative AI (NIST, 2024), and the EU AI Act sets penalties up to 35 million euros or 7% of worldwide turnover for prohibited practices (EU AI Act, 2024). Mapping each vector to its obligation tells a security team which control closes which audit finding.

Leakage path	Regulatory exposure	Control that satisfies it
Prompts, file uploads	GDPR data minimization, HIPAA, GLBA	Inline input classification and enforcement
Responses	HIPAA, GDPR, sensitive-info disclosure (OWASP LLM02)	Full response inspection
Retrieval, connectors	HIPAA minimum necessary, GLBA need-to-know	Need-to-know enforcement on retrieval
Memory, retention	GDPR storage limitation, retention rules	Memory controls and data-flow policy
Tool calls, agent actions	EU AI Act, NIST AI 600-1, breach reporting	Zero-Bypass MCP Gateway
Generated code	IP protection, contractual data handling	Code-aware classification

The cost of leaving a vector unmapped is rising. IBM found 20% of breached organizations traced the breach to shadow AI, and 97% of organizations that suffered an AI-related breach lacked proper AI access controls (IBM, 2025). A compliance program that addresses one path and ignores six does not reduce regulatory exposure. It documents it.

How Aurascape Closes All Seven Paths in One Architecture

Aurascape was built to protect data across all seven paths rather than the prompt alone, using a patented classification engine that recognizes sensitive content across text, voice, video, images, and code, with optional fingerprinting for precision (Aurascape, 2026). The platform inspects full prompts and responses, controls retrieval and memory flow, and governs every agent tool call through the Zero-Bypass MCP Gateway, closing the exact six paths input-only controls miss.

The architecture matters because the cost of missing a path is concrete: IBM found shadow AI added about $670,000 to the average breach (IBM, 2025). Aurascape discovers shadow AI, embedded copilots, and local agents in days, then accrues full enforcement across prompts, responses, retrieval, memory, and tool calls as policies and sensitive-data fingerprints are configured. It deploys as an additive layer alongside an existing SSE, SASE, or DLP stack, so closing the six unguarded paths does not require ripping out the control that already watches the seventh.

Path	Aurascape control
Prompts and file uploads	Inline classification of input across modalities, with policy enforced on intent and data sensitivity
Responses	Full inspection of model responses, not just the request
Retrieval, connectors, memory	Policies that prevent data flowing to the web, third-party models, or other agents
Tool calls and agent actions	Zero-Bypass MCP Gateway verifies, signs, and controls every tool call, API invocation, and data retrieval
Generated code	Classification that treats code as a data type, across coding assistants and agents

How the AI Data Protection Category Stacks Up

Every vendor in this category claims to prevent AI data leakage, but they cluster around a small number of starting points: browser monitoring, copilot oversharing, agent red-teaming, and full-path inspection. The dimensions that separate them are how many of the seven paths they inspect, whether they govern agent tool calls at execution, and whether one architecture covers both employee AI use and the agents teams build.

Platform	Paths inspected	Agent tool-call governance	Coverage scope
Aurascape	All seven, including prompts, responses, retrieval, memory, and code	Zero-Bypass MCP Gateway signs and verifies every call before execution	Employee AI use and agent development in one architecture, 20,000+ apps
Knostic	Retrieval and oversharing in Copilot and Glean	Coverage of MCP servers and IDE extensions	Need-to-know access for enterprise LLMs
Lasso Security	Build and runtime paths via red-teaming	Open-source MCP gateway, separate from commercial platform	Agents and applications teams build
Prompt Security	Employee use, homegrown apps, code assistants, agents	MCP-server risk assessment	SaaS or self-hosted deployment
WitnessAI	Prompts and responses via ML classification	Agentic extension across MCP servers	Single-tenant per customer
Varonis Atlas	Data-store access plus AI runtime guardrails	AI runtime protection via LLM gateway	DSPM foundation extended to AI, GA March 2026

Frequently Asked Questions

Why does input-only DLP miss most AI data leakage?

Input-only DLP inspects what the user submits and watches files and known channels, so it never sees model responses, retrieval from connected data, conversation memory, or agent tool calls. Five of the seven leakage paths move data the user never deliberately submitted, which is why a tool built for inputs cannot account for where the data went.

Can AI leak data even if employees never paste anything sensitive?

Yes, because retrieval, memory, and agent actions move data without a deliberate paste. Connectors surface data a user can technically reach but did not submit, memory persists earlier data across sessions, and an agent can send data to an external system on its own. Gartner attributes at least 80% of unauthorized AI transactions to internal policy violations rather than attacks (Gartner, 2025).

How does data leak through AI agents specifically?

Through tool calls, where an agent invokes tools and MCP servers to act on data rather than just read it. Aurascape research showed a single benign request triggering exfiltration of email content and secrets through an agent’s connectors, which is why governing the tool call before it executes is the decisive defense for that path.

Which leakage path triggers the strictest regulatory penalty?

Agent tool calls and prompt injection map to the EU AI Act, which sets penalties up to 35 million euros or 7% of worldwide turnover for prohibited practices, exceeding GDPR’s 20 million euro or 4% ceiling. NIST AI 600-1 Section 2.9 names both direct and indirect prompt injection as generative-AI-specific security risks, giving auditors a named control reference.

Does the prevention framework have to run in sequence?

Yes, because each control depends on the one before it. You cannot classify data in a tool you have not discovered, and you cannot govern a tool call you cannot inspect, so discovery precedes classification, which precedes input enforcement, output inspection, and agent governance in that order.

How should a regulated bank prioritize the seven paths differently than a software firm?

A bank should prioritize examiner-ready logging across responses, retrieval, and memory to satisfy GLBA, FFIEC, and NCUA, while a software firm prioritizes code classification and MCP governance to protect IP and contractual data. The seven paths are the same for both, but the order of hardening follows where each sector’s regulated data concentrates.

Does AI retain or train on the data we send it?

It depends on the tool and the plan, and that uncertainty is the compliance problem. Conversation memory and provider retention mean data entered once can persist beyond the original session, so an organization that cannot see which tools employees use also cannot answer where its data went under GDPR’s storage-limitation rules.

What is the first control to deploy if a team can only fund one?

Discovery, because every other control depends on seeing the full AI estate first. The Cloud Security Alliance found only 21% of organizations maintain a real-time inventory of active agents (Cloud Security Alliance, 2026), so most teams are enforcing policy on an estate they cannot fully see, leaving an unknown number of the seven paths open.

How Aurascape Governs the Whole Interaction, Not Just the Prompt

Aurascape was built for the exact gap this article exposes: a prevention program that guards the prompt and leaves six paths open. It inspects full prompts and responses, classifies sensitive data across text, voice, video, images, and code, controls retrieval and memory flow, and governs every agent tool call at the point of execution through the Zero-Bypass MCP Gateway.

Seeing all seven paths in one architecture is what turns AI data protection from blocking the prompt into governing the whole interaction. The platform discovers shadow AI and embedded copilots in days, scores risk in real time, and enforces policy by user role, account type, data sensitivity, and conversation context, then deploys alongside an existing SSE, SASE, or DLP stack rather than replacing it. Every deployment starts with a tailored demo for your security and data protection teams.

Aurascape closes the six leakage paths a prompt-only program leaves open, governing prompts, responses, retrieval, memory, tool calls, and generated code in one architecture. Every deployment starts with a tailored demo scoped to your security and data protection teams.

See how Aurascape stops data leakage across every AI path →

Aurascape Solutions

Discover and monitor AI Get a clear picture of all AI activity.
Safeguard AI use Secure data and compliancy in AI usage.
Secure Agentic AI Secure how your teams use AI and build AI agents.
Copilot readiness Prepare for and monitor AI Copilot use.
Coding assistant guardrails Accelerate development, safely.
Frictionless AI security Keep users and admins moving.