Gemini 3.5 Flash can now see and control your screen

Huma ShaziaJune 25, 2026 at 3:01 PM4 min read

Key Takeaways

Google has built Computer Use directly into Gemini 3.5 Flash, eliminating the need for a separate model
The model scores 78.4 on OSWorld, beating GPT-5.4 mini (72.1) but trailing Anthropic's Opus 4.8 (83.4)
Enterprise safeguards include user confirmation for sensitive actions and automatic task termination on prompt injection detection

Google has integrated Computer Use directly into Gemini 3.5 Flash, allowing the model to see, understand, and operate screens across browsers, desktops, and mobile devices. The capability, previously available only as a separate Gemini 2.5 model, now ships as a native feature.

This is Google catching up to Anthropic, which launched Computer Use in Claude back in October 2024. The difference: Google is embedding the capability into its flagship lightweight model rather than keeping it siloed. For developers building agents that need to automate software testing, office workflows, or cross-platform tasks, that integration matters.

How does Gemini 3.5 Flash perform on benchmarks?

On OSWorld, the standard benchmark for evaluating AI computer control, Gemini 3.5 Flash scores 78.4. That is a significant jump from Gemini 3 Flash, which scored 65.1. It also beats OpenAI's GPT-5.4 mini at 72.1.

The competition remains tight at the top. OpenAI's GPT-5.5 edges ahead with 78.7, and Anthropic's Opus 4.8 leads the field at 83.4. Claude's Sonnet 4.6 matches Gemini 3.5 Flash exactly at 78.4. Google's own Gemini 3.1 Pro sits at 76.2, below the new Flash model.

Model	OSWorld Score
Anthropic Opus 4.8	83.4
OpenAI GPT-5.5	78.7
Gemini 3.5 Flash	78.4
Claude Sonnet 4.6	78.4
Gemini 3.1 Pro	76.2
OpenAI GPT-5.4 mini	72.1
Gemini 3 Flash	65.1

What can developers build with this?

Computer Use combined with Gemini's existing tools, including function calls, Search, and Maps, opens a clear path to autonomous agents. Google positions the feature for software testing and office automation, but the applications extend further. Any repetitive screen-based workflow becomes a candidate for automation.

The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. Google has also published a Browserbase demo and a GitHub reference implementation for developers who want working code rather than documentation.

How is Google addressing security risks?

Letting an AI model control your screen introduces obvious attack surfaces. Prompt injection, where malicious instructions hidden in web pages or documents hijack the model's actions, is the primary concern.

Google's defense has three layers. First, adversarial training bakes resistance into the model itself. Second, an optional enterprise safeguard requires user confirmation before the model executes sensitive or irreversible actions. Third, another optional safeguard automatically stops tasks when it detects indirect prompt injections.

Google also recommends sandboxing, human oversight, and strict access controls in its best practices documentation. The recommendations acknowledge something Google does not say outright: no security measure is foolproof when you give an AI model the keys to your computer.

Why does native integration matter?

Bundling Computer Use into Gemini 3.5 Flash rather than offering it as a separate model simplifies deployment. Developers do not need to manage model switching or route requests to different endpoints based on whether a task requires screen control.

It also signals where Google sees the market heading. Standalone chat models are table stakes. The value shifts to agents that can act, not just respond. Integrating agentic capabilities into the base model reflects that bet.

ℹ️

Logicity's Take

Google's integration of Computer Use into Gemini 3.5 Flash is less about leading on benchmarks, where Anthropic still holds a clear edge, and more about distribution. By shipping the capability in its most widely deployed model, Google makes screen control accessible to every developer already on the Gemini API. The real test is whether enterprise customers trust any AI model with production system access, regardless of benchmark scores.

Frequently Asked Questions

What is Gemini 3.5 Flash Computer Use?

Computer Use is a feature that lets Gemini 3.5 Flash see, understand, and interact with computer screens, browsers, and mobile devices. It enables autonomous task execution across different environments.

How does Gemini 3.5 Flash compare to Claude on OSWorld?

Gemini 3.5 Flash scores 78.4 on OSWorld, matching Claude Sonnet 4.6 but trailing Anthropic's Opus 4.8, which leads at 83.4.

Is Gemini Computer Use safe from prompt injection attacks?

Google uses adversarial training and offers two optional enterprise safeguards: user confirmation for sensitive actions and automatic task termination when prompt injections are detected. Google still recommends sandboxing and human oversight.

How can developers access Gemini 3.5 Flash Computer Use?

The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. Google has also published a Browserbase demo and a GitHub reference implementation.

What tasks can Gemini Computer Use automate?

Google highlights software testing and office automation. The feature can handle any repetitive screen-based workflow across browser, mobile, and desktop environments.

ℹ️

Need Help Implementing This?

Building AI agents that can safely control screens requires careful architecture. If you are evaluating Gemini Computer Use for enterprise automation, Logicity can connect you with implementation partners who specialize in agentic AI deployments.

Source: The Decoder / Matthias Bastian

Also Read

Alan raises €480M at €5.5B valuation as Prosus bets on AI health

Startups & Innovation·4 min

Gemini 3.5 Flash can now see and control your screen

Key Takeaways

How does Gemini 3.5 Flash perform on benchmarks?

What can developers build with this?

How is Google addressing security risks?

Why does native integration matter?

Logicity's Take

Frequently Asked Questions

Need Help Implementing This?

Related Articles

ChatGPT in Corporate Communications: A $0 AI Detector Test

Bezos AI Lab Gets $10B: What Project Prometheus Means

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

Also Read

Alan raises €480M at €5.5B valuation as Prosus bets on AI health

Qualcomm claims 6x HBM efficiency with new HBC architecture

Kunal Shah to lead WhatsApp as Meta invests $900M in CRED