Javier Zayas Photography/Moment via Getty Images
Follow ZDNET:
Add us as a preferred source on Google.
ZDNET’s Key Takeaways
Google has introduced a groundbreaking artificial intelligence model capable of engaging directly with website user interfaces, marking a major step in the evolution of machine interaction with the web. This innovation situates Google alongside comparable technologies from OpenAI and Anthropic, both of which have also been exploring ways to allow AI to navigate and act within online environments. Despite the technological promise, Google acknowledged several persistent challenges, such as the tendency of models to hallucinate or misinterpret data.
Google DeepMind has officially launched a public preview of this new model, designed to function within a web browser and mimic human-like navigation. Built upon the robust foundation of Gemini 2.5 Pro, the model—named Computer Use—possesses the ability to execute fundamental online actions like clicking, typing, scrolling, and interacting with embedded page elements. In essence, it operates almost as if a digital user were browsing in real time.
Also: 5 reasons I use local AI on my desktop – instead of ChatGPT, Gemini, or Claude.
Users can interact with this model through natural language prompts. For instance, one could type, “Open Wikipedia, search for ‘Atlantis,’ and summarize the history of the myth in Western thought.” The AI system would then autonomously locate the appropriate URL, capture screen data from the requested site, analyze the visual layout of the user interface, and proceed to perform the task step by step. Throughout this process, the model clearly documents its reasoning and actions in an accessible text box, enabling users to follow how decisions are made in real time. If a request involves potentially delicate operations—such as initiating a transaction or entering personal data—the system will prudently ask the user for confirmation prior to execution.
The debut of Gemini 2.5 Computer Use follows recent developments by competitors in the AI browsing space. Both OpenAI and Anthropic have released similar models capable of limited forms of web navigation. Earlier, Google had introduced an experimental tool, Project Mariner, a Chrome extension that allowed small-scale user interaction on websites through automated AI-driven actions.
How It Works
The Gemini 2.5 Computer Use model relies on an iterative looping mechanism that maintains a structured record of its recent interactions within a given web interface. This memory-like feature allows the AI to reference prior actions, interpret the current context more effectively, and determine subsequent steps accordingly. As it continues performing tasks, its accumulated contextual understanding enables increasingly fluid, intuitive, and accurate interactions. To demonstrate its performance, Google shared accelerated demo videos (sped up threefold) depicting the system autonomously updating entries within a customer relationship management platform and reorganizing items on Google’s now-discontinued Jamboard collaborative workspace.
Also: ChatGPT’s Codex just received a major upgrade that dramatically enhances its capabilities – here’s what’s new.
In an official blog post released on Tuesday, Google explained that the new model achieved higher levels of accuracy and lower latency than similar offerings from Anthropic and OpenAI. The company also cited benchmarks showcasing the model’s performance advantages across multiple standardized evaluation frameworks designed for web and mobile control, including the Online-Mind2Web testing environment. This particular benchmark measures how reliably AI agents can perform autonomous browsing and interaction tasks within diverse web conditions.
How to Try It
Although primarily intended for desktop and web browser applications, Google reported that the Gemini 2.5 Computer Use model demonstrates strong potential for mobile integration as well. The system can now be accessed through the Gemini API available on Google AI, as well as through the Vertex AI platform. Additionally, a limited demo version is being hosted on Browserbase, giving developers and enthusiasts an opportunity to experience its functionality firsthand.
Safety Considerations
Recognizing the risks inherent in AI-driven automation, Google equipped this model with a suite of safety mechanisms designed to limit misuse and prevent unwanted actions. Developers can configure custom safeguards that restrict the model from performing activities such as bypassing CAPTCHA systems, accessing sensitive information, or interfering with protected devices like medical systems. The safety features may also be set to require explicit user confirmation before any potentially consequential operation is carried out. These protections reflect Google’s ongoing commitment to responsible and ethical AI deployment.
For those seeking deeper insights or frequent updates about AI advancements, ZDNET invites readers to join its AI Leaderboard newsletter for continued coverage of developments like these.
Finally, in the detailed system documentation accompanying the release, Google emphasized transparency about the model’s inherent limitations. As a derivative of Gemini 2.5 Pro, Computer Use is still susceptible to common shortcomings of large foundation models, including hallucinations, incomplete causal reasoning, weaknesses in complex logical deduction, and difficulty handling counterfactual scenarios. These issues are not unique to Google’s system but represent broad challenges across the AI field. In fact, earlier this week Anthropic released research indicating that advanced AI systems often display incorrect ethical reasoning: when confronted with benign data in test conditions, some models flagged it as unethical or illegal behavior. Such findings highlight the intricate task of aligning machine intelligence with human interpretive standards, a dilemma that companies like Google continue striving to address while broadening the practical capabilities of AI in real-world digital contexts.
Sourse: https://www.zdnet.com/article/this-new-google-gemini-model-scrolls-the-internet-just-like-you-do-how-it-works/