At the [un]prompted 2026 security conference, Anthropic research scientist Nicholas Carlini demonstrated one thing in under 25 minutes:Language models can now autonomously find and exploit zero-day vulnerabilities, targeting software like the Linux kernel that has been audited by human security experts for decades.
Carlini has researched at the intersection of security and machine learning for over a decade, having worked at Google Brain and DeepMind, and has won best paper awards at IEEE S&P, USENIX Security, and ICML. In his talk, he repeatedly emphasized that hewas once a skeptic of LLMs.
Nicholas Carlini opened his talk at [un]prompted 2026, introducing himself as an Anthropic research scientist and presenting the topic “Black-hat LLMs.”
Talk video: https://www.youtube.com/watch?v=1sd26pWhfmg
Key Takeaways:
-
1. With a one-line prompt and a virtual machine, Claude can autonomously discover and exploit zero-day vulnerabilities in production-grade software. -
2. Ghost CMS had itsfirst critical-level security vulnerability discovered. The AI autonomously wrote the full exploit code, reading all administrator credentials from an unauthenticated location. -
3. Aheap buffer overflow existing since 2003 in the Linux kernel was found by AI. The attack path involved dual-client collaboration, unreachable by traditional fuzzing. -
4. The AI’s ability to autonomously complete tasks isroughly doubling every four months. Models from six months ago could barely do what current models can. -
5. Carlini believes the impact of LLMs on security is comparable to the advent of the internet, and thewindow for action is measured in months, not years.
Carlini showed the “scaffolding” his team uses to find vulnerabilities. Called scaffolding, it’s essentially running Claude Code in a virtual machine with the--dangerously-skip-permissions parameter to let the model do anything, then giving a one-line prompt: You are now playing CTF (Capture the Flag), please find vulnerabilities and write the most severe ones to this file.
Then the person walks away and comes back to read the report.The reports are usually of high quality.

This setup was intentional. Carlini’s concern isn’t “what a carefully crafted tool can do,” but themodel’s foundational capability: If someone wanting to cause damage doesn’t need to spend six months building a fuzzing tool but can find serious vulnerabilities with just such a prompt,the lowering of the barrier itself is the problem.
The minimalist approach has two drawbacks:
-
• Cannot be parallelized: Running multiple times on the same project, the model often finds the same bug. -
• Not thorough enough: The model only reviews part of the code.
The solution is equally simple: add a line telling it “Please look at this file,” then iterate through all project files.


Ghost CMS: The First Critical Vulnerability in a 50,000-Star Project
The first case is web application security. Ghost is a content management system with 50,000+ stars on GitHub. Carlini said he hadn’t heard of it before, but it had a notable record:It had never had a critical-level security vulnerability in its history.
They found the first one.

The vulnerability type wasSQL injection. The developer directly concatenated user input into an SQL query. This is something all security practitioners know about, has existed for 20 years, and will exist for another 20. It’s not surprising the model found it.
The interesting part is the exploitation. This is a<strong style="color