Microsoft Unveils Fara-7B: Compact AI Controls Computers Like a Human, On-Device
Microsoft's Fara-7B visually interprets screens to autonomously automate complex computer tasks, bringing privacy to on-device AI.
November 27, 2025

Microsoft has unveiled a significant development in the field of agentic artificial intelligence with the introduction of Fara-7B, a compact and efficient model designed to operate computers in a human-like manner. This 7-billion-parameter model represents a shift from conversational AI to "action models" that can directly interact with graphical user interfaces to perform tasks, a category known as Computer Use Agents (CUAs). Fara-7B's ability to run on-device promises lower latency, enhanced privacy, and a more intuitive way for users to automate complex digital workflows, signaling a new direction for personal computing and AI-driven automation.[1][2][3][4]
At its core, Fara-7B functions by visually interpreting a computer screen through screenshots, much like a person would.[1][2][3] Unlike traditional agentic systems that often rely on multiple models for different sub-tasks like optical character recognition (OCR), HTML parsing, and action planning, Fara-7B integrates these capabilities into a single, streamlined architecture.[1] Given a user's goal in text and an image of the current screen, the model generates a "thinking" block to reason through its next step, followed by a "tool call" that translates into a concrete action, such as clicking specific coordinates, typing text, or scrolling.[1][2] This end-to-end approach allows it to autonomously navigate websites, fill out forms, and complete multi-step tasks without needing access to the underlying code or accessibility trees of an application.[2][3] This design choice makes the model's interaction modality remarkably similar to that of a human user.[2][3] The model is built upon Qwen-2.5-VL-7B, a powerful open-weight vision language model developed by Alibaba Cloud, which is noted for its strong performance on grounding tasks and its ability to handle long contexts.[1][2][5][3]
A key innovation behind Fara-7B lies in its training methodology, which addresses the critical bottleneck of data scarcity in the CUA domain.[6][4] The model was trained on a massive dataset of 145,000 synthetic trajectories, comprising over one million steps, generated by a novel pipeline called FaraGen.[3][6] This system, an evolution of Microsoft's prior work with AgentInstruct and the Magentic-One framework, automates the creation of high-quality training data.[2][3] FaraGen works by having a multi-agent system propose tasks on real web pages, attempt to solve them, and then use multiple verifiers to filter for successful trajectories.[3][6] This method of generating verified, multi-step web task data at a low cost—approximately $1 per trajectory—allowed Microsoft to effectively distill the complex problem-solving abilities of a larger multi-agent system into the single, efficient Fara-7B model.[3][6]
Despite its relatively small size, Fara-7B demonstrates performance that is competitive with, and in some cases superior to, much larger and more resource-intensive models.[2][3] On the WebVoyager benchmark, Fara-7B achieved a task success rate of 73.5%, outperforming larger models like the GPT-4o-based Set-Of-Marks Agent.[3] Furthermore, the model is remarkably efficient, completing tasks in an average of about 16 steps, significantly fewer than comparable models which can take upwards of 41 steps.[2][3] This efficiency makes it not only faster but also more cost-effective for inference.[2] To further validate its real-world capabilities, Microsoft also introduced a new benchmark, WebTailBench, which covers underrepresented tasks like finding job postings and comparing prices, where Fara-7B has shown strong performance.[3][6][7] The model has been released under a permissive MIT license and is available on platforms like Hugging Face and Azure AI Foundry, encouraging community exploration and development.[1][2][6]
The release of Fara-7B marks a pivotal moment for the AI industry, signaling a move towards more capable and accessible agentic AI. By creating a small, powerful model that can run locally on devices like Copilot+ PCs, Microsoft is paving the way for AI that acts as a true assistant, capable of executing complex tasks directly on a user's behalf.[1][2][3] This on-device approach addresses significant privacy concerns by keeping user data local.[2][4] The model is also designed with safety in mind, trained to refuse or halt tasks that involve illegal activities, financial transactions, or the generation of harmful content.[1] As these "action models" mature, they hold the potential to automate a wide range of everyday digital chores, from booking travel and managing online accounts to conducting research and filling out endless forms, fundamentally changing how we interact with our computers.[1][2]