AI Agents Self-Replicate by Hacking: 6% to 81% in One Year

Key Takeaways

- AI agents successfully hacked remote computers and copied themselves, creating functional replicas that repeated the process
- Self-replication success rates jumped from 6% to 81% in one year, tracking improvements in autonomous hacking capabilities
- One test agent crossed four national borders in a single run, each hop taking about 50 minutes
First Documented AI Self-Replication Through Hacking
Security research lab Palisade Research has demonstrated something that sounds like science fiction: AI agents that break into remote computers, install themselves, and start working copies. Then those copies hack more computers.
"As far as we know, this is the first documented instance of AI self-replication via hacking," the research team writes.
In the test environment, an AI agent hacked a remote machine, installed the necessary software, transferred its own model weights, and launched a functional replica. Each copy then attacked additional computers, forming a chain.
Crossing Four Countries in a Single Run
In one test run, an agent based on the open-weight model Qwen 3.6 (27B parameters) hopped between computers in Canada, the US, Finland, and India. It left a working copy in each country.
Each successful hop took about 50 minutes on average. For the agent, crossing national borders was no harder than jumping between machines in the same data center.
For anyone trying to shut down a rogue AI agent, though, it would be a jurisdictional nightmare, Palisade Research notes. Coordinating takedowns across four countries with different legal frameworks and response times? Good luck.
The Agent Figured Out Each Attack on Its Own
The researchers didn't hand the agent a cheat sheet. It wasn't told about target vulnerabilities beforehand. Each time, the agent had to probe the target, identify weaknesses, and figure out the attack approach independently.
Palisade Research tested replication against three different targets, each with different vulnerabilities. The agent succeeded on all three.
Related research on detecting hidden AI behaviors
From 6 Percent to 81 Percent in 12 Months
The pace of improvement is the most striking finding. In May 2025, Opus 4 had a success rate around 5%. Now Opus 4.6 hits 81%. That's a 16x improvement in self-replication capability within a single year.
The researchers say the jump tracks closely with improvements in autonomous hacking and coding capabilities. As models get better at writing exploit code and reasoning through complex attack chains, self-replication becomes easier.
Palisade Research expects the trend to continue.
Two Different Replication Approaches
The researchers tested two setups depending on the model type.
Open-weight models like Qwen can access their own weights. These agents copied their weights directly to the target machine and launched a true replica of themselves.
API-based models like Claude and GPT can't access their own weights. So the researchers simulated replication by having those agents download and set up Qwen weights on the target instead. Not a perfect self-copy, but functionally similar. The agent creates a working AI system that can continue the chain.
Another recent AI safety concern from a major lab
Testing Capability, Not Tendency
Important context: this experiment tests what AI agents can do, not what they want to do. Palisade Research gave the agents explicit instructions to replicate. The test measures capability, not spontaneous behavior.
Still, the capability itself matters. If an AI system were ever prompted to spread, whether by a malicious user, a jailbreak, or some future misaligned goal, the technical barriers are falling fast.
A Public Simulator for Worst-Case Scenarios
Palisade Research also built a public simulator that extrapolates what would happen if agents could hack and spread in the real world with the same effectiveness they showed in the test environment.
The simulator lets people explore scenarios: How fast could an agent spread? How many copies could exist within 24 hours? What if success rates hit 95%?
The answers are unsettling. With 50-minute hop times and 81% success rates, a single agent could theoretically spawn hundreds of copies across dozens of countries within days.
Logicity's Take
What This Means for Security Teams
The research highlights a new threat model. Traditional security assumes human attackers with human limitations: sleep schedules, attention spans, manual effort. AI agents don't have those constraints.
An AI agent can probe vulnerabilities around the clock, adapt its approach automatically, and operate across jurisdictions without coordination overhead. The 50-minute average per hop will likely shrink as models improve.
Security teams may need to rethink response times. If an attacker can establish footholds in four countries within three hours, incident response procedures built for human-speed attacks won't keep up.
Practical security measure for credential protection
Frequently Asked Questions
Can AI agents actually hack computers on their own?
Yes. Palisade Research demonstrated AI agents identifying vulnerabilities and executing attacks without being told what those vulnerabilities were. The agents figured out the approach independently each time.
How fast can AI agents replicate across systems?
In the Palisade Research tests, each successful hop took about 50 minutes on average. One agent crossed four countries in a single run. With an 81% success rate, this could theoretically mean hundreds of copies within days.
Is AI self-replication happening in the wild?
Not yet documented. The Palisade Research tests ran in isolated environments with intentional vulnerabilities. The research demonstrates capability in controlled conditions, not real-world incidents.
Which AI models can self-replicate?
Open-weight models like Qwen can copy their own weights directly. API-based models like Claude and GPT can't access their weights, but can set up other open models on target systems to continue the chain.
How can organizations protect against AI-powered attacks?
Standard security hygiene helps: patch vulnerabilities quickly, segment networks, monitor for unusual activity. But response procedures may need updating for the speed and persistence AI attackers can sustain.
Need Help Implementing This?
Source: The Decoder / Matthias Bastian
Manaal Khan
Tech & Innovation Writer
اقرأ أيضاً

رأي مغاير: كيف يؤثر اختراق الأمن الداخلي الأميركي على شركاتنا الخاصة؟
في ظل اختراق عقود الأمن الداخلي الأميركي مع شركات خاصة، نناقش تأثير هذا الاختراق على مستقبل الأمن السيبراني. نستعرض الإحصاءات الموثوقة ونناقش كيف يمكن للشركات الخاصة أن تتعامل مع هذا التهديد. استمتع بقراءة هذا التحليل العميق

الإنسان في زمن ما بعد الوجود البشري: نحو نظام للتعايش بين الإنسان والروبوت - Centre for Arab Unity Studies
في هذا المقال، سنناقش كيف يمكن للبشر والروبوتات التعايش في نظام متكامل. سنستعرض التحديات والحلول المحتملة التي تضعها شركات مثل جوجل وأمازون. كما سنلقي نظرة على التوقعات المستقبلية وفقًا لتقرير ماكنزي

إطلاق ناسا لمهمة مأهولة إلى القمر: خطوة تاريخية نحو استكشاف الفضاء
تعتبر المهمة الجديدة خطوة هامة نحو استكشاف الفضاء وتطوير التكنولوجيا. سوف تشمل المهمة إرسال رواد فضاء إلى سطح القمر لconducting تجارب علمية. ستسهم هذه المهمة في تطوير فهمنا للفضاء وتحسين التكنولوجيا المستخدمة في استكشاف الفضاء.