Oppo Open-Sources X-OmniClaw, an On-Device Android AI Agent

Key Takeaways

- X-OmniClaw runs directly on Android devices, calling cloud models only for complex reasoning
- The agent combines camera, screen, and voice into a single perception pipeline for task execution
- Photo galleries get processed during idle time into searchable text-based memory stored locally
On-Device vs Cloud: A Different Approach
Oppo's Multi-X team has released X-OmniClaw, an open-source AI agent for Android that handles tasks across apps using your phone's camera, screen, and voice. The key difference from existing solutions: it runs on the physical device itself.
In the technical report, Oppo draws a clear line between X-OmniClaw and cloud phone platforms like RedFinger, Alibaba's Wuying, and Tencent Cloud Phone. Those services run agents inside virtualized Android instances in data centers. They can't access local sensors, cameras, or private data.
X-OmniClaw takes the opposite route. Core logic for perception, control, and app interaction all live on the phone. A cloud language model only gets called as "fuel" for higher-level reasoning when needed, according to the report. The specific local models aren't named, but the documentation lists components like an on-device grounding model and OCR for detecting tappable UI elements.

Three Perception Channels, One Pipeline
The agent bundles camera, screen, and voice into a single processing pipeline. A vision-language model interprets the scene and the user's request before triggering any action.
The perception stack combines text, voice, camera, and screen signals, aligns them in time, and passes a structured intent to the language model for execution.

In one demo, a user asks "How much does this cost on Taobao?" while pointing the camera at a product. The system rephrases that internally to "price of Evian spray on Taobao" and then hands the structured intent off for execution.

Photo Gallery as Searchable Memory
For long-term memory, X-OmniClaw condenses local data into semantic entries. During idle time, gallery photos get processed into compact descriptions of objects, scenes, and events. These get stored in a Markdown file.

The system filters sensitive content before saving. This creates a searchable text-based memory of your photos without requiring cloud processing.

Learning by Cloning User Behavior
X-OmniClaw learns from how you use apps. Instead of replaying tap paths, it clones an app page's structure and learns to replicate your actions autonomously.

Show the agent the path to a deeply nested discount page once. Next time, it can navigate there on its own. This approach means the agent adapts to individual usage patterns rather than relying on generic app navigation.

Demo Capabilities
In demos, Oppo showed X-OmniClaw handling several tasks:
- Comparing prices of products captured on camera across e-commerce apps
- Acting as a floating assistant ("ScreenAvatar") to work through practice problems in sequence
- Creating photo albums from a user's gallery based on voice requests

Why Open Source Matters Here
The open-source release means developers can inspect, modify, and build on X-OmniClaw's architecture. For privacy-conscious users, the on-device approach addresses concerns about sending personal data, photos, and screen content to cloud servers for processing.
The tradeoff is clear: cloud-based agents can tap into more powerful models, while on-device agents keep data local but face compute constraints. X-OmniClaw's hybrid approach, using cloud models only for complex reasoning, attempts to balance both.
Logicity's Take
Related: how tech companies handle user data storage
Frequently Asked Questions
Does X-OmniClaw send my data to the cloud?
No. Core processing happens on-device. Cloud language models are only called for complex reasoning tasks, and the agent doesn't route your phone's sensors or private data through cloud servers.
What phones can run X-OmniClaw?
The technical report doesn't specify hardware requirements. Since it's open-source for Android, compatibility will likely depend on the on-device models and processing power needed.
How is X-OmniClaw different from Google Assistant or Siri?
X-OmniClaw is designed as an autonomous agent that can navigate apps, learn from your behavior, and complete multi-step tasks. Traditional assistants handle voice commands but don't typically learn workflows or operate across apps autonomously.
Is X-OmniClaw available to download now?
Oppo has open-sourced the project, but the technical report doesn't detail consumer availability. Developers can access the code, though end-user apps may come later.
Need Help Implementing This?
Source: The Decoder / Jonathan Kemper
Huma Shazia
Senior AI & Tech Writer
Related Articles
Browse allZuckerberg's Superintelligence Lab Faces Setback
The first AI model from Zuckerberg's superintelligence lab has failed to impress compared to its rivals, sparking concerns about the lab's direction. We take a closer look at what happened and why it matters.

Muse Spark Launch Propels Meta AI App to Top 5
The recent launch of Muse Spark has significantly boosted the popularity of Meta AI app, pushing it into the top 5. We explore what this means for the AI landscape.

Meta's Muse Spark AI Model Lags Behind ChatGPT and Claude
Meta's Muse Spark AI model still can't outperform ChatGPT and Claude in key areas, despite its advancements. We explore what this means for the AI landscape.

Meta Launches Muse Spark AI To Challenge ChatGPT
Meta launches Muse Spark AI to challenge ChatGPT and Claude, we explore what this means for the AI landscape. Muse Spark AI is a significant development in the AI chatbot space.
Also Read

5 Pi Zero 2 W Projects That Punch Above Their Weight
The Raspberry Pi Zero 2 W costs around $15 and fits in your palm, but it can run network-wide ad blocking, a backup VPN server, and other services that typically require beefier hardware. Here are five projects that make this tiny board surprisingly practical for home labs and self-hosted setups.

Android 17 Gets 'Continue On,' Google's Answer to Apple Handoff
Google announced Continue On, a new Android 17 feature that lets users start tasks on their phone and pick them up on a tablet. At launch, the feature only works phone-to-tablet, but Google plans to make it bidirectional. Combined with the recently announced Googlebook laptops, this signals a serious push toward Apple-like device coordination.

Will Robotics Have Its ChatGPT Moment?
Two robotics veterans ask whether the field is approaching a breakthrough similar to what large language models achieved for AI. Jonathan Hurst of Agility Robotics and Hans Peter Brøndmo, formerly of Google X's Everyday Robots, weigh in on what's holding robots back and what could change.