Microsoft Unleashes 100+ AI Agents to Hunt Windows Bugs

Key Takeaways

- MDASH uses 100+ specialized AI agents that argue with each other about whether vulnerabilities are real
- The system found 16 new Windows vulnerabilities on May 12, 2026, including 4 critical remote code execution flaws
- MDASH scored 88.45% on the CyberGym benchmark, the highest result to date
What Is MDASH?
Microsoft has built a security system that sounds like a courtroom drama. Called MDASH (Multi-Model Agentic Scanning Harness), it orchestrates more than 100 specialized AI agents across multiple AI models. These agents don't just scan code. They argue with each other about what they find.
Unlike single-model approaches, MDASH runs what Microsoft describes as an ensemble of frontier and distilled models. The system is model-agnostic. When a new AI model comes out, Microsoft can test it against previous ones just by changing configuration settings.
How AI Agents Debate Each Other
MDASH works in four stages. First, it analyzes source code and maps out potential attack surfaces. Specialized auditor agents then scan for suspicious areas in the code.
The third stage is where things get interesting. A second group of agents, which Microsoft calls "debaters," argue for and against the exploitability of each finding. Think of it as prosecution versus defense, but for code vulnerabilities. After duplicates are merged, Evidence Leader agents try to trigger the vulnerability through specific inputs.

The system also accepts plugins that let security experts feed in domain-specific knowledge. Kernel calling conventions, IPC trust boundaries, and other technical details that no foundation model would know on its own can be added to improve detection.
The Vulnerabilities Found
Microsoft classifies four of the 16 discovered vulnerabilities as critical. These include remote code execution flaws in:
- tcpip.sys kernel component
- IKEv2 service (ikeext.dll)
- netlogon.dll
- dnsapi.dll
Ten of the 16 vulnerabilities affect kernel mode. Most are accessible from the network without authentication, which makes them particularly dangerous.
Microsoft points out that its own code base is especially hard to audit. Windows, Hyper-V, and Azure are proprietary. They aren't part of public training data that AI models learn from. That makes automated scanning more challenging and, arguably, more necessary.
Benchmark Performance Comes With a Caveat
On the public CyberGym benchmark, which contains 1,507 real vulnerabilities, MDASH scored 88.45%. That's the top result on the leaderboard, roughly five percentage points ahead of the next best model.

But there's a catch. Microsoft acknowledges the comparison is misleading. MDASH is an entire framework with 100+ agents working together. The models it's being compared against are individual systems. Those models would likely score higher if wrapped in a similar multi-agent framework.
Microsoft hasn't disclosed which specific AI models power MDASH. That makes independent evaluation difficult.
Why This Matters for Security Teams
Automated vulnerability detection isn't new. Static analysis tools have existed for decades. What's different here is the adversarial approach. Having AI agents argue with each other about vulnerabilities mimics how human security teams operate, with red teams finding flaws and blue teams trying to determine if they're exploitable.
The plugin architecture also matters. Security teams often have specialized knowledge about their systems that general-purpose tools miss. Being able to inject that knowledge into the scanning process could reduce false positives and catch context-specific vulnerabilities.
Logicity's Take
Frequently Asked Questions
What is Microsoft MDASH?
MDASH (Multi-Model Agentic Scanning Harness) is Microsoft's AI-powered security system that uses more than 100 specialized AI agents to automatically detect software vulnerabilities. The agents work in stages, with some scanning code and others debating whether findings are exploitable.
How many vulnerabilities has MDASH found?
MDASH discovered 16 new vulnerabilities (CVEs) in the Windows networking and authentication stack, reported on Patch Tuesday, May 12, 2026. Four of these are classified as critical, including remote code execution flaws.
What AI models does MDASH use?
Microsoft hasn't disclosed which specific AI models power MDASH. The company describes it as using an ensemble of frontier and distilled models, and the system is model-agnostic, meaning new models can be swapped in through configuration changes.
Is MDASH available for other companies to use?
Microsoft hasn't announced whether MDASH will be made available as a product or service. Currently, it appears to be an internal tool used for finding vulnerabilities in Microsoft's own products like Windows, Hyper-V, and Azure.
More on AI and cybersecurity threats
Need Help Implementing This?
Source: The Decoder / Matthias Bastian
UK Antitrust Regulator Launches Investigation into Microsoft Software Bundling
The UK's Competition and Markets Authority (CMA) has officially launched a strategic market status investigation into Microsoft's business software ecosystem, specifically examining the bundling of products like Windows, Office, and Copilot. The inquiry aims to determine if these practices are uncompetitive and is scheduled to conclude by February 2026.
Windows 11 and Edge Zero-Days Exploited at Pwn2Own Berlin 2026
The new article provides details on the Pwn2Own Berlin 2026 competition where security researchers exploited 24 unique zero-day vulnerabilities in Windows 11 and Microsoft Edge, earning over $523,000 in prizes. It names specific researchers like Orange Tsai and lists other targets including OpenAI Codex and NVIDIA, representing real-world exploits occurring after the MDASH findings.
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse allZuckerberg's Superintelligence Lab Faces Setback
The first AI model from Zuckerberg's superintelligence lab has failed to impress compared to its rivals, sparking concerns about the lab's direction. We take a closer look at what happened and why it matters.

Muse Spark Launch Propels Meta AI App to Top 5
The recent launch of Muse Spark has significantly boosted the popularity of Meta AI app, pushing it into the top 5. We explore what this means for the AI landscape.

Meta's Muse Spark AI Model Lags Behind ChatGPT and Claude
Meta's Muse Spark AI model still can't outperform ChatGPT and Claude in key areas, despite its advancements. We explore what this means for the AI landscape.

Meta Launches Muse Spark AI To Challenge ChatGPT
Meta launches Muse Spark AI to challenge ChatGPT and Claude, we explore what this means for the AI landscape. Muse Spark AI is a significant development in the AI chatbot space.
Also Read
SpaceX Signs $920M Monthly Google Deal for 110,000 Nvidia Chips
SpaceX will lease 110,000 Nvidia AI chips to Google Cloud for $920 million per month, a contract potentially worth $30 billion through 2029. The deal transforms SpaceX into an AI infrastructure provider ahead of its $1.7 trillion IPO next week.

Tecno Pova 8 5G Launches June 11 With 8,000mAh Battery
Tecno has confirmed the Pova 8 5G will arrive in India on June 11 with an 8,000mAh battery and Nothing-inspired dot matrix display. The phone runs on a MediaTek Dimensity 7100 chip and ships with Android 16 out of the box.

The Wolf Among Us 2 Finally Gets a Release Window: 2027
Seven years after its announcement, Telltale Games has confirmed The Wolf Among Us 2 will arrive in 2027. The sequel to the beloved 2014 narrative adventure made its return at Summer Game Fest with a new story trailer and the promise of a remastered original game by holiday 2026.