Google has begun offering an early look at its latest innovation in artificial intelligence, introducing a new addition to its Gemini ecosystem that brings a transformative capability to machine interaction with the web. The model, formally known as Gemini 2.5 Computer Use, is specifically designed to operate within a standard internet browser, giving AI agents the unprecedented ability to navigate, interpret, and act within interfaces originally intended for human users rather than automated systems. By doing so, Google moves one step closer to creating AI entities that can perform complex digital tasks with humanlike intuition and dexterity.
At its core, Gemini 2.5 Computer Use leverages a sophisticated blend of visual comprehension and advanced reasoning to understand both the structure and intent of the content it encounters on screen. When a user issues a command—such as requesting that a form be completed and submitted—the model does not rely on external APIs or code-level integrations. Instead, it visually perceives the elements on the page, interprets what is required, and physically carries out the sequence of interactions a person would perform: moving the cursor, entering text, clicking buttons, or confirming actions. This method allows it to function seamlessly in environments that lack formal programmatic connections, making it invaluable for scenarios like testing user interfaces or navigating websites that are otherwise closed to automated access.
Earlier iterations of similar technology have already been quietly integrated into other Google research efforts, such as AI Mode and Project Mariner, an experimental platform that explores autonomous AI-driven task execution inside browsers. In trials, for example, these systems demonstrated their potential by performing tasks like automatically adding grocery items to an online shopping cart based on a given list of ingredients—an early glimpse of how true digital agency might soon evolve.
The timing of Google’s announcement is particularly significant, as it follows closely on the heels of OpenAI’s recent unveiling of new ChatGPT applications during its annual Dev Day. That event renewed industry attention toward OpenAI’s own ChatGPT Agent functionality, which likewise seeks to empower AI models to carry out real-world tasks autonomously. However, while OpenAI and Google appear to be converging on similar goals, each is adopting a distinct design philosophy. Anthropic, another key player in this space, introduced a comparable capability within its Claude AI system last year, making the competitive landscape around “computer use” AI increasingly dynamic.
To demonstrate Gemini 2.5’s potential, Google released several short demonstration videos showcasing the model performing various browser-based actions. The footage, which the company notes has been accelerated to three times its actual speed, illustrates the model’s ability to manage interactive web elements efficiently and accurately. According to Google, performance evaluations indicate that Gemini 2.5 Computer Use surpasses several leading competitors on a range of standardized web and mobile benchmarks.
There are, however, important boundaries defining its current scope. Unlike rival tools that simulate full computer environments or integrate at the operating-system level, Google’s approach is deliberately restricted to the confines of a web browser. The company emphasizes that the model is not yet optimized for operating-system-wide control, meaning it cannot, for example, directly manipulate files or system settings. Instead, the present implementation supports a curated set of thirteen core actions—among them, opening and closing browser tabs, typing text into fields, and performing drag-and-drop operations. This functional restraint reflects Google’s cautious and methodical approach to expanding AI autonomy while maintaining security and oversight.
Developers eager to experiment with Gemini 2.5 Computer Use can access the model through Google AI Studio as well as Vertex AI, the company’s enterprise-grade machine-learning platform. For those who simply wish to observe it in operation, a hands-on demonstration is available on Browserbase, an interactive playground where the model executes commands like “Play a game of 2048” or “Browse Hacker News for trending debates.” By offering both professional development access and a more public-facing preview, Google positions Gemini 2.5 as a bridge between cutting-edge research and real-world usability, marking a pivotal step toward AI systems capable of engaging meaningfully with the human digital landscape.
Sourse: https://www.theverge.com/news/795463/google-computer-use-gemini-ai-model-agents