According to media reports, Google is planning to preview its large-scale action model "Project Jarvis" in December. Sources familiar with the matter have revealed that this project will assist users in tasks such as "collecting research, purchasing products, or booking flights."
Jarvis will be powered by future versions of Google Gemini, specifically optimized for the Chrome browser. This tool aims to help users "automate routine web tasks" by capturing screenshots, parsing content, and automatically clicking buttons or entering text. Currently, there are a few seconds of delay between different operations when using this tool.
In fact, nearly all major AI companies are currently developing similar models with comparable functionalities. Microsoft's Copilot Vision can engage users in discussions about the web pages they are browsing; Apple's Apple Intelligence is expected to have screen recognition capabilities across multiple applications next year; Anthropic recently launched a beta version of Claude that can perform operations on computers; OpenAI is also working on similar functionalities.
Reports suggest that Google's plans to showcase "Jarvis" may undergo changes and are expected to be initially released to a small group of testers to assist the company in addressing potential bugs.