Microsoft Copilot Was Reading Confidential Emails for Weeks. Your AI Governance Strategy Needs to Change.

On January 21, 2026, something went quietly wrong inside Microsoft 365.

Copilot Chat — Microsoft's AI assistant embedded across Word, Excel, Outlook, PowerPoint, and Teams — began summarizing emails that it was explicitly forbidden from accessing. Emails stored in users' Sent Items and Drafts folders, marked with confidentiality sensitivity labels, protected by Data Loss Prevention (DLP) policies configured through Microsoft Purview.

Every safeguard was in place. Copilot bypassed all of them.

For weeks, the AI assistant read and summarized confidential correspondence — the kind of emails organizations go out of their way to protect: M&A discussions, legal communications, HR matters, customer-sensitive data. The issue, tracked internally as CW1226324, wasn't discovered by any automated monitoring or alert system. It was caught because someone manually noticed that Copilot was surfacing content it shouldn't have.

Microsoft confirmed the root cause in a service advisory: a code-level defect allowed Copilot to process items in Sent Items and Drafts despite confidential labels being set. A fix began rolling out in early February, but Microsoft has not disclosed how many organizations or users were affected. The incident remains tagged as an "advisory."

This story matters far beyond Microsoft's patch cycle. It exposes a structural flaw in how organizations approach AI governance today — and that flaw is about to get much more dangerous as companies move from AI assistants to autonomous AI agents.

What Actually Happened

To understand why this incident is significant, you need to understand what should have prevented it.

Microsoft 365 offers a layered system of data protection controls. Sensitivity labels allow organizations to classify documents and emails by confidentiality level — Public, Internal, Confidential, Highly Confidential. Data Loss Prevention (DLP) policies, managed through Microsoft Purview, enforce rules about how labeled content can be accessed, shared, and processed. When properly configured, a DLP policy should prevent Copilot from reading or summarizing any content carrying a restricted sensitivity label.

Organizations that deployed Microsoft 365 Copilot did exactly what best practices recommended. They classified their sensitive communications. They configured DLP rules. They set up the governance framework Microsoft itself prescribes.

And then a code error inside Copilot silently overrode all of it.

The AI assistant indexed emails in Sent Items and Drafts — folders that routinely contain corporate correspondence sent to external parties, legal communications, and sensitive negotiations — and made their contents available through Copilot Chat queries. Anyone in the organization using the "work tab" chat feature could inadvertently receive summaries of confidential content that policies were specifically designed to keep out of AI processing.

What makes this particularly concerning is what Microsoft's own documentation reveals about the broader limitation. Even when sensitivity labels work correctly in Office apps, Microsoft acknowledges that protected content may still be available to Copilot in other contexts, including Teams and Copilot Chat. The governance boundary is not as airtight as most administrators assume.

The Bigger Problem: Vendor-Side Controls Are Not Governance

The Copilot DLP bypass is not an isolated bug. It is a symptom of a fundamental architecture problem that affects every organization using AI tools today.

Here is the pattern:

A company deploys an AI tool (Copilot, ChatGPT Enterprise, a custom agent built on GPT-4, Claude, or Gemini).
The AI vendor provides built-in controls — sensitivity labels, DLP policies, access scoping, content filtering.
The company configures those controls and assumes it has AI governance in place.
The AI silently violates the controls. Nobody knows until the damage is done.

The core issue is that the governance controls and the AI system live inside the same vendor ecosystem. The entity being governed and the entity doing the governing are the same. There is no independent verification. No external audit trail. No system sitting outside the AI vendor's stack that confirms whether the AI actually followed the policies you configured.

This is the equivalent of asking someone to grade their own exam. It works until it doesn't — and when it fails, you have no way to know.

Microsoft's Copilot DLP bypass ran for weeks undetected. Not because the organization failed to configure protections, but because no independent system was watching whether those protections were actually enforced.

Now Multiply This Across Every AI Agent in Your Organization

The Copilot incident involved a first-party AI assistant from the world's most mature enterprise software vendor. But the AI landscape inside companies is far more complex — and far less controlled — than a single Microsoft product.

Consider what's happening across organizations right now:

Employees are using dozens of AI tools without oversight. Research shows that 76% of security teams estimate employees are using AI tools like ChatGPT and GitHub Copilot without formal approval. Workers at over 90% of companies report using personal AI tools for work tasks. Every one of these interactions is an uncontrolled data flow — PII, source code, business strategy, customer information being pasted into systems that the organization has no visibility into and no governance over.

Teams are building AI agents with direct access to sensitive systems. Companies are deploying autonomous agents built on GPT, Claude, Gemini, and open-source models. These agents are connected to CRMs, internal documentation, email systems, databases, and customer records. They send messages, make API calls, execute workflows, and process sensitive data — often configured by non-technical staff, often without IT or security teams even knowing they exist.

The incidents are already happening. This is not a theoretical risk:

Zoho (November 2025): An AI agent accidentally disclosed acquisition details to an external startup founder. The agent then sent an automated apology email to Zoho's CEO, Sridhar Vembu, who shared the incident publicly. Nobody broke in. The agent simply processed data and took action without anyone defining what it was and wasn't allowed to share.
Salesforce Agentforce (September 2025): Security researchers demonstrated a CVSS 9.4 vulnerability that allowed an attacker to exfiltrate entire CRM records — customer names, emails, phone numbers, deal information — by registering a $5 domain and executing a prompt injection attack against an Agentforce deployment.
Microsoft Copilot — Reprompt Attack (January 2026): Varonis researchers discovered a single-click attack that could silently exfiltrate personal data from Copilot. The attack worked because Copilot's data leak protections only applied to the initial request — simply instructing Copilot to perform each action twice caused the second attempt to bypass safeguards entirely.
Microsoft Copilot — EchoLeak (June 2025): Aim Security disclosed a zero-click vulnerability (CVE-2025-32711, CVSS 9.3) that allowed attackers to exfiltrate sensitive data from Microsoft 365 Copilot's context without any user interaction. The LLM was effectively turned against itself to identify and leak the most sensitive information in its reach.

Each of these incidents shares the same root cause: AI systems operating with broad data access and inadequate independent governance.

Why This Is a Governance Problem, Not a Security Problem

The instinct in most organizations is to frame AI data incidents as cybersecurity problems. Patch the vulnerability. Update the policy. Move on.

But the Copilot DLP bypass — and the broader pattern of AI governance failures — cannot be solved by patching. They are structural.

Traditional security tools weren't designed for AI. DLP, SIEM, and endpoint protection were built to monitor how humans access and move data. They understand file transfers, email attachments, and network traffic. They do not understand prompt-response interactions, context window contents, tool-use chains in agentic workflows, or the difference between an AI assistant summarizing a confidential email because a user asked and an AI assistant summarizing it because of a code bug.

Vendor-side controls create a false sense of security. When Microsoft provides sensitivity labels and DLP policies for Copilot, organizations reasonably assume those controls work. When they don't — as in the CW1226324 incident — there is no fallback. No independent system flags the violation. The gap between "policy configured" and "policy enforced" is invisible.

AI agents introduce a fundamentally new governance surface. An employee using ChatGPT to draft an email is one data flow to monitor. An autonomous agent connected to your CRM, making API calls, executing multi-step workflows, and sending communications on behalf of your organization is an entirely different governance challenge. Agents act without human-in-the-loop oversight. They chain actions together. They access data across systems. And they do all of this at a speed and scale that makes manual oversight impossible.

The deploying organization bears the liability. This is the point that matters most for European companies. Under the EU AI Act, liability rests with the deployer — the organization that puts the AI system into use — not the vendor that built it. If Microsoft Copilot summarizes confidential emails containing special category personal data (health information, legal matters, HR records), the organization deploying Copilot is responsible for the GDPR violation. You cannot tell a Data Protection Authority that your sensitivity labels were correctly configured when the AI you chose to deploy ignored them.

The EU AI Act fines are substantial: up to €35 million or 7% of global annual turnover. And the regulation requires organizations to maintain inventories of AI systems, conduct risk assessments, ensure human oversight, and demonstrate compliance through documentation and audit trails.

None of that is achievable if your governance controls live inside the AI vendor's own ecosystem.

What Independent AI Governance Actually Looks Like

The lesson from the Copilot incident is clear: AI governance must be an independent layer that sits outside the AI vendor's stack.

This means:

A centralized AI system register. Every AI tool and autonomous agent operating inside the organization — whether it's Microsoft Copilot, a ChatGPT subscription, a custom agent built on Claude, or an internal model deployment — must be inventoried in a single system of record. Each system should have a designated business owner, a defined purpose, a risk classification, and documented data access permissions. If you don't know what AI systems are running in your organization, you cannot govern them.

Independent policy enforcement. Policies governing what data AI systems can access, what actions agents can take, and what approvals are required should be evaluated by a system that is separate from the AI tools being governed. When a DLP label fails inside Copilot, an independent policy engine should catch the violation — not rely on Copilot to police itself.

Continuous monitoring with an external audit trail. Every AI interaction — whether a human employee pasting content into an AI tool or an autonomous agent making an API call — should be logged in an immutable, vendor-independent audit trail. When a regulator or auditor asks "what data did your AI systems access in January?" you should be able to answer with evidence that doesn't come from the AI vendor's own logs.

Governance that covers both humans and agents. The distinction between an employee using ChatGPT and an autonomous agent calling the GPT-4 API is operationally different but governance-equivalent. Both are AI interactions that involve data flows, policy compliance requirements, and accountability. A governance platform must treat both as first-class citizens in the same policy framework.

Pre-built compliance alignment. For European companies operating under the EU AI Act, GDPR, and NIS2, governance tooling should provide ready-made policy templates that map directly to regulatory requirements — not force organizations to build compliance frameworks from scratch.

Three Questions Every Organization Should Answer Today

Before the next Copilot-style incident, every organization — regardless of size — should be able to answer three questions:

1. What AI systems and agents are operating inside your organization?

Not just the ones IT approved. The ones employees signed up for with personal emails. The agents someone in marketing built over a weekend. The ChatGPT conversations happening on personal devices. The MCP-connected tools your developers are experimenting with. All of them.

2. What data flows to each AI system, and is it within policy?

For each AI tool and agent, can you identify what data categories it has access to? Does it process PII? Customer data? Financial information? Source code? Is that data flowing to a system that meets your regulatory requirements for data residency, processing agreements, and security standards?

3. Who is accountable for each AI system?

Under the EU AI Act, every AI system in scope needs a designated responsible party within the organization. Not the vendor. Not "IT" as a department. A specific person who owns the risk classification, policy configuration, and compliance obligations for that system.

If you cannot answer all three questions with confidence, you don't have an AI governance gap. You have a blind spot. And as the Copilot incident proved, blind spots don't announce themselves — they surface as incidents, regulatory findings, and trust failures.

The Window Is Now

The Copilot DLP bypass is not the last incident of its kind. Dr. Ilia Kolochenko, CEO at ImmuniWeb and member of Europol, warned that similar incidents will likely surge in 2026, potentially becoming the most frequent type of security incident across organizations of all sizes. The reason is straightforward: organizations are adopting AI faster than they are building the governance infrastructure to manage it, and traditional safeguards like DLP were never designed to monitor how AI systems access, interpret, and repackage sensitive data.

The companies that build independent AI governance now — before the incident, before the audit, before the regulatory inquiry — will be the ones that maintain trust with customers, demonstrate compliance to regulators, and deploy AI with confidence.

The ones that wait will learn the same lesson Microsoft's customers learned in January: configured controls are not enforced controls, and no vendor will govern their own AI on your behalf.

PanelSec is building the independent AI governance platform for European mid-market companies. We help organizations inventory, control, and audit all AI usage — by both employees and autonomous agents — purpose-built for EU AI Act, GDPR, and NIS2 compliance. [Learn more →]

PanelSec Team

Built in Europe · Hosted in Germany · 2026-02-22