AI & Machine Learning

Oppo Open-Sources X-OmniClaw, an On-Device Android AI Agent

Huma Shazia17 May 2026 at 1:38 pm4 min read

Key Takeaways

X-OmniClaw runs directly on Android devices, calling cloud models only for complex reasoning
The agent combines camera, screen, and voice into a single perception pipeline for task execution
Photo galleries get processed during idle time into searchable text-based memory stored locally

On-Device vs Cloud: A Different Approach

Oppo's Multi-X team has released X-OmniClaw, an open-source AI agent for Android that handles tasks across apps using your phone's camera, screen, and voice. The key difference from existing solutions: it runs on the physical device itself.

In the technical report, Oppo draws a clear line between X-OmniClaw and cloud phone platforms like RedFinger, Alibaba's Wuying, and Tencent Cloud Phone. Those services run agents inside virtualized Android instances in data centers. They can't access local sensors, cameras, or private data.

X-OmniClaw takes the opposite route. Core logic for perception, control, and app interaction all live on the phone. A cloud language model only gets called as "fuel" for higher-level reasoning when needed, according to the report. The specific local models aren't named, but the documentation lists components like an on-device grounding model and OCR for detecting tappable UI elements.

Three Perception Channels, One Pipeline

The agent bundles camera, screen, and voice into a single processing pipeline. A vision-language model interprets the scene and the user's request before triggering any action.

The perception stack combines text, voice, camera, and screen signals, aligns them in time, and passes a structured intent to the language model for execution.

The perception stack pulls in text, voice, camera, and screen signals, syncs them up, and hands a structured intent to the language model. — The perception stack pulls in text, voice, camera, and screen signals, then syncs them for processing

In one demo, a user asks "How much does this cost on Taobao?" while pointing the camera at a product. The system rephrases that internally to "price of Evian spray on Taobao" and then hands the structured intent off for execution.

The user points the camera at a bottle and asks "How much does this cost?" The agent opens Taobao, scrolls through results, and reads out prices and sales figures. — A user points the camera at a bottle and asks about pricing. The agent searches the e-commerce app automatically.

Photo Gallery as Searchable Memory

For long-term memory, X-OmniClaw condenses local data into semantic entries. During idle time, gallery photos get processed into compact descriptions of objects, scenes, and events. These get stored in a Markdown file.

The memory module crunches gallery photos during idle time into a Markdown file called "image-memory.md," filtering out sensitive content before saving. — The memory module summarizes gallery photos during idle time into a Markdown file, filtering sensitive content

The system filters sensitive content before saving. This creates a searchable text-based memory of your photos without requiring cloud processing.

From a voice request for a parrot album, the agent searches its condensed gallery memory for matching photos and hands them off to CapCut. — From a voice request for a parrot album, the agent searches its condensed gallery memory and creates the collection

Learning by Cloning User Behavior

X-OmniClaw learns from how you use apps. Instead of replaying tap paths, it clones an app page's structure and learns to replicate your actions autonomously.

Instead of replaying tap paths, X-OmniClaw clones an app page — Instead of replaying tap paths, X-OmniClaw clones app page structures to learn user workflows

Show the agent the path to a deeply nested discount page once. Next time, it can navigate there on its own. This approach means the agent adapts to individual usage patterns rather than relying on generic app navigation.

Show the agent the path to a deeply nested Meituan discount page once. Next time, a voice command gets you there - no public deeplink needed. — Show the agent a path to a nested page once, and it can replicate the navigation independently

Demo Capabilities

In demos, Oppo showed X-OmniClaw handling several tasks:

Comparing prices of products captured on camera across e-commerce apps
Acting as a floating assistant ("ScreenAvatar") to work through practice problems in sequence
Creating photo albums from a user's gallery based on voice requests

As a "ScreenAvatar," X-OmniClaw works through practice problems in sequence, tapping correct answers on its own. — As a ScreenAvatar, X-OmniClaw works through practice problems in sequence as a floating assistant

Why Open Source Matters Here

The open-source release means developers can inspect, modify, and build on X-OmniClaw's architecture. For privacy-conscious users, the on-device approach addresses concerns about sending personal data, photos, and screen content to cloud servers for processing.

The tradeoff is clear: cloud-based agents can tap into more powerful models, while on-device agents keep data local but face compute constraints. X-OmniClaw's hybrid approach, using cloud models only for complex reasoning, attempts to balance both.

ℹ️

Logicity's Take

Frequently Asked Questions

Does X-OmniClaw send my data to the cloud?

No. Core processing happens on-device. Cloud language models are only called for complex reasoning tasks, and the agent doesn't route your phone's sensors or private data through cloud servers.

What phones can run X-OmniClaw?

The technical report doesn't specify hardware requirements. Since it's open-source for Android, compatibility will likely depend on the on-device models and processing power needed.

How is X-OmniClaw different from Google Assistant or Siri?

X-OmniClaw is designed as an autonomous agent that can navigate apps, learn from your behavior, and complete multi-step tasks. Traditional assistants handle voice commands but don't typically learn workflows or operate across apps autonomously.

Is X-OmniClaw available to download now?

Oppo has open-sourced the project, but the technical report doesn't detail consumer availability. Developers can access the code, though end-user apps may come later.

ℹ️

Need Help Implementing This?

Source: The Decoder / Jonathan Kemper

Also Read

Hacks & Workarounds·6 min

5 Pi Zero 2 W Projects That Punch Above Their Weight

The Raspberry Pi Zero 2 W costs around $15 and fits in your palm, but it can run network-wide ad blocking, a backup VPN server, and other services that typically require beefier hardware. Here are five projects that make this tiny board surprisingly practical for home labs and self-hosted setups.

Manaal Khan·20 May 2026

Trending Tech·4 min

Android 17 Gets 'Continue On,' Google's Answer to Apple Handoff

Google announced Continue On, a new Android 17 feature that lets users start tasks on their phone and pick them up on a tablet. At launch, the feature only works phone-to-tablet, but Google plans to make it bidirectional. Combined with the recently announced Googlebook laptops, this signals a serious push toward Apple-like device coordination.

Huma Shazia·20 May 2026

Robotics·4 min

Will Robotics Have Its ChatGPT Moment?

Two robotics veterans ask whether the field is approaching a breakthrough similar to what large language models achieved for AI. Jonathan Hurst of Agility Robotics and Hans Peter Brøndmo, formerly of Google X's Everyday Robots, weigh in on what's holding robots back and what could change.