All posts
AI & Machine Learning

Microsoft Unleashes 100+ AI Agents to Hunt Windows Bugs

Manaal Khan14 May 2026 at 9:43 pm5 min read
Microsoft Unleashes 100+ AI Agents to Hunt Windows Bugs

Key Takeaways

Microsoft Unleashes 100+ AI Agents to Hunt Windows Bugs
Source: The Decoder
  • MDASH uses 100+ specialized AI agents that argue with each other about whether vulnerabilities are real
  • The system found 16 new Windows vulnerabilities on May 12, 2026, including 4 critical remote code execution flaws
  • MDASH scored 88.45% on the CyberGym benchmark, the highest result to date

What Is MDASH?

Microsoft has built a security system that sounds like a courtroom drama. Called MDASH (Multi-Model Agentic Scanning Harness), it orchestrates more than 100 specialized AI agents across multiple AI models. These agents don't just scan code. They argue with each other about what they find.

Unlike single-model approaches, MDASH runs what Microsoft describes as an ensemble of frontier and distilled models. The system is model-agnostic. When a new AI model comes out, Microsoft can test it against previous ones just by changing configuration settings.

16 new CVEs
Discovered by MDASH in Windows networking and authentication stack, reported on Patch Tuesday, May 12, 2026

How AI Agents Debate Each Other

MDASH works in four stages. First, it analyzes source code and maps out potential attack surfaces. Specialized auditor agents then scan for suspicious areas in the code.

The third stage is where things get interesting. A second group of agents, which Microsoft calls "debaters," argue for and against the exploitability of each finding. Think of it as prosecution versus defense, but for code vulnerabilities. After duplicates are merged, Evidence Leader agents try to trigger the vulnerability through specific inputs.

Microsoft's MDASH pipeline showing the four-stage process from code analysis to vulnerability verification
Microsoft's MDASH pipeline showing the four-stage process from code analysis to vulnerability verification

The system also accepts plugins that let security experts feed in domain-specific knowledge. Kernel calling conventions, IPC trust boundaries, and other technical details that no foundation model would know on its own can be added to improve detection.

The Vulnerabilities Found

Microsoft classifies four of the 16 discovered vulnerabilities as critical. These include remote code execution flaws in:

  • tcpip.sys kernel component
  • IKEv2 service (ikeext.dll)
  • netlogon.dll
  • dnsapi.dll

Ten of the 16 vulnerabilities affect kernel mode. Most are accessible from the network without authentication, which makes them particularly dangerous.

Microsoft points out that its own code base is especially hard to audit. Windows, Hyper-V, and Azure are proprietary. They aren't part of public training data that AI models learn from. That makes automated scanning more challenging and, arguably, more necessary.

Benchmark Performance Comes With a Caveat

On the public CyberGym benchmark, which contains 1,507 real vulnerabilities, MDASH scored 88.45%. That's the top result on the leaderboard, roughly five percentage points ahead of the next best model.

Line chart showing success rates by release date. A highlighted Microsoft data point reaches 88.45 percent, leading all other models on the CyberGym benchmark leaderboard.
Success rates by release date showing MDASH's performance on the CyberGym benchmark

But there's a catch. Microsoft acknowledges the comparison is misleading. MDASH is an entire framework with 100+ agents working together. The models it's being compared against are individual systems. Those models would likely score higher if wrapped in a similar multi-agent framework.

Microsoft hasn't disclosed which specific AI models power MDASH. That makes independent evaluation difficult.

Why This Matters for Security Teams

Automated vulnerability detection isn't new. Static analysis tools have existed for decades. What's different here is the adversarial approach. Having AI agents argue with each other about vulnerabilities mimics how human security teams operate, with red teams finding flaws and blue teams trying to determine if they're exploitable.

The plugin architecture also matters. Security teams often have specialized knowledge about their systems that general-purpose tools miss. Being able to inject that knowledge into the scanning process could reduce false positives and catch context-specific vulnerabilities.

ℹ️

Logicity's Take

Frequently Asked Questions

What is Microsoft MDASH?

MDASH (Multi-Model Agentic Scanning Harness) is Microsoft's AI-powered security system that uses more than 100 specialized AI agents to automatically detect software vulnerabilities. The agents work in stages, with some scanning code and others debating whether findings are exploitable.

How many vulnerabilities has MDASH found?

MDASH discovered 16 new vulnerabilities (CVEs) in the Windows networking and authentication stack, reported on Patch Tuesday, May 12, 2026. Four of these are classified as critical, including remote code execution flaws.

What AI models does MDASH use?

Microsoft hasn't disclosed which specific AI models power MDASH. The company describes it as using an ensemble of frontier and distilled models, and the system is model-agnostic, meaning new models can be swapped in through configuration changes.

Is MDASH available for other companies to use?

Microsoft hasn't announced whether MDASH will be made available as a product or service. Currently, it appears to be an internal tool used for finding vulnerabilities in Microsoft's own products like Windows, Hyper-V, and Azure.

Also Read
Russian Hackers Targeted 13,500 Signal Users in Phishing Campaign

More on AI and cybersecurity threats

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

UK Antitrust Regulator Launches Investigation into Microsoft Software Bundling

The UK's Competition and Markets Authority (CMA) has officially launched a strategic market status investigation into Microsoft's business software ecosystem, specifically examining the bundling of products like Windows, Office, and Copilot. The inquiry aims to determine if these practices are uncompetitive and is scheduled to conclude by February 2026.

M

Manaal Khan

Tech & Innovation Writer