The Warning Was Inside the Leak

6 minute read

Last week, Anthropic accidentally published nearly 3,000 internal documents to a publicly searchable database. A default CMS setting made uploaded files public. No exploit. No sophisticated attacker. Just an unlocked door.

The internet had a field day with the irony. A company that built its brand on AI safety, failing basic information security hygiene. But most of the coverage missed the more important story — the one inside the leak.

Buried in those documents was a draft announcement for Anthropic’s next model, codenamed Mythos. In that draft, Anthropic’s own team wrote this:

Mythos is “currently far ahead of any other AI model in cyber capabilities” and “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace defenders.”

The people building the most capable AI offensive tool in history wrote, in their own words, that it will outpace defenders.

They published it by accident. But they knew it before they did.

The Irony Is Actually the Point

It’s easy to laugh at the leak. Cybersecurity stocks didn’t — CrowdStrike fell 7%, Palo Alto fell 6% on Friday. The market understood something that the coverage didn’t fully articulate:

The threat model just changed.

Not because Mythos was leaked. Because its existence was confirmed, along with an authoritative description of its capabilities from the organization that built it.

Every threat model your security team is operating under right now was calibrated against today’s AI capabilities — AI-assisted phishing, script-kiddie automation of known CVEs, social engineering at scale. That’s the threat you’ve been defending against.

Mythos isn’t that. It represents a qualitative step change: novel vulnerability discovery, multi-step reasoning across complex attack surfaces, autonomous chain-of-thought exploitation. These capabilities have existed in theory. Mythos is the first system its creators believe can execute them reliably.

The model hasn’t even launched yet.

The Defender’s Asymmetry Problem

There’s a structural problem in cybersecurity that has existed forever and is about to get dramatically worse.

Attackers need to find one way in. Defenders need to cover everything.

AI makes this asymmetry worse in a specific way: it dramatically lowers the skill floor for attackers while doing relatively little to lower the cost of comprehensive defense. A motivated actor with access to Mythos can direct it at your infrastructure and let it explore. It doesn’t sleep. It doesn’t get bored. It can test a thousand variants of an approach while your team is handling an unrelated incident.

Your blue team is still human. Limited working hours, limited tooling budgets, alert fatigue. They’re triaging a queue while an AI system on the other side does systematic, patient, creative reconnaissance.

This is what Anthropic meant by “outpace defenders.” They weren’t speculating. They were describing their own benchmark results.

What “Outpace Defenders” Looks Like in Practice

Let me be concrete.

Vulnerability discovery at scale. Today’s AI can help a developer find a buffer overflow with guidance. Tomorrow’s AI autonomously audits a codebase, identifies the vulnerability class, reasons about exploitability in the context of your specific deployment, and generates a working proof-of-concept. The gap between AI-assisted security research and AI-autonomous exploitation is closing faster than the industry wants to admit.

Social engineering that defeats human judgment. Current spear-phishing is detectable because it lacks context — plausible text, but impersonal. Advanced models with long context windows and access to your public information surface (LinkedIn, GitHub, press releases, Glassdoor) can craft targeted attacks that sound exactly like someone who understands your org’s internal dynamics. At scale. Against every person in your company simultaneously.

CI/CD pipeline infiltration. This week, Trivy — the most widely used container vulnerability scanner — was compromised for the second time in a month. TeamPCP force-pushed 75 of 76 version tags, turning trusted CI/CD references into credential thieves harvesting SSH keys, cloud tokens, and Kubernetes configs. That attack was done by humans. An AI-directed version targets the specific tools your organization is known to use, correlates them with your cloud configuration, and prioritizes by likely yield. Your build pipeline is now the attack surface.

Zero-day chain construction. The hardest part of advanced persistent threats isn’t finding a single vulnerability — it’s chaining multiple low-severity issues into a high-impact path. That’s a reasoning problem. It’s exactly what frontier AI excels at.

The Safety-Competition Paradox

Here’s what the Mythos leak actually reveals, and it’s not comfortable.

Anthropic knows their model has these capabilities. They said so in writing. They are building it anyway, because if they don’t, OpenAI will. Or a well-funded startup will. Or a nation-state with fewer scruples about deployment will.

This isn’t a criticism of Anthropic specifically — every frontier lab operates under the same constraint. The competitive dynamics of the AI race have created a situation where the organizations most committed to safety feel compelled to build and release increasingly powerful offensive tools because the alternative is ceding the frontier to someone with fewer commitments.

The result: the most capable cyber-offensive AI in history will be commercially available, probably this year. Every serious threat actor on the planet will have access to it or something like it within 12–18 months.

This is the context your blue team is operating in.

What Blue Team Looks Like When Offense Has AI

The defensive response to this moment is tractable — but it requires a different mental model than most security organizations are currently using.

You cannot patch your way to safety. Vulnerability management against a system that discovers novel attack chains faster than your team can triage them is a losing strategy. Move investment upstream: architecture reviews, threat modeling, blast radius reduction, defense-in-depth that makes exploitation economically unviable even when vulnerabilities exist.

Blue teams need AI now, not later. The only way to close the asymmetry is to put AI in the hands of defenders at the same rate attackers are acquiring it. AI-augmented threat detection, AI-assisted incident response, AI-driven adversarial simulation against your own systems. Organizations that build this capacity today will have 18 months of operational learning before the threat fully materializes.

Your build pipeline is your new perimeter. Mandatory SBOM generation, cryptographic signing of build artifacts, real-time integrity verification of CI/CD dependencies — these move from best practice to table stakes. The Trivy attack is a preview. The next generation will be AI-directed and systematic.

Assume credential exposure. When an attacker has an AI system doing reconnaissance and exploitation, the window between credential compromise and lateral movement collapses. Zero-trust architecture — not as a marketing term but as an actually implemented policy where every service authenticates every request — is the only architecture that limits blast radius when the perimeter fails.

Invest in human defenders, not just tools. The instinct when facing AI-powered offense is to buy AI-powered defense products. Necessary but not sufficient. The humans on your blue team need to understand how AI-directed attacks reason — what they look for, how they chain vulnerabilities, what operational patterns give them away. That requires deliberate training, red-team exercises with current AI capabilities, and institutional knowledge-building that takes time. Start now.

The Unlocked Door

The week started with an AI safety company leaving 3,000 documents in an unsecured database.

The week ended with those documents revealing that the same company believes it has built an AI system capable of offensive cyber operations that outpace defenders.

I don’t think this is a story about Anthropic’s hypocrisy. I think it’s a story about how hard this problem actually is. The people who understand AI capabilities best — who have thought hardest about safety, who have written internal warnings about what their own tools can do — are still subject to the oldest failure mode in information security: a misconfigured default setting.

If that’s true at Anthropic, it’s true everywhere. Including your organization.

The warning was inside the leak. The question is whether you act on it before someone else finds your unlocked door.

David Lapsley, Ph.D., is Founder and CEO of ShipKodeAI. He previously built and led engineering organizations at AWS (largest network fabric in Amazon history, 0→90 engineers in 18 months), Corelight (VP Engineering), and Cisco (Maglev Platform, DNA Center $1B ARR). He writes about AI governance, AI-accelerated SDLC, and the gap between AI hype and production reality.

Share on

X Facebook LinkedIn Bluesky

David Lapsley, Ph.D.

The Warning Was Inside the Leak

The Irony Is Actually the Point

The Defender’s Asymmetry Problem

What “Outpace Defenders” Looks Like in Practice

The Safety-Competition Paradox

What Blue Team Looks Like When Offense Has AI

The Unlocked Door

Share on

You may also enjoy

AI Agents Are Finding Vulnerabilities Faster Than You Can Patch Them

The Agentic SDLC Transition Is Happening Whether Your Team Is Ready or Not

CRITICAL: Axios npm Package Backdoored — 100M Weekly Downloads, Cross-Platform RAT

The Vibe Coding Trap: Why Architecture Matters More When AI Writes the Code