/dq/media/media_files/2025/11/26/fara-2b-2025-11-26-10-29-45.png)
Microsoft Research announced the release of Fara-7B, its first agentic Small Language Model (SLM) specifically designed for computer use. While traditional chatbots generate text responses, Computer Use Agent (CUA) models like Fara-7B take action. They interact with computer interfaces, such as web pages, using simulated mouse and keyboard inputs to complete tasks on behalf of a user.
Fara-7B stands out because of its ultra-compact size, featuring only 7 billion parameters. This scale allows the CUA model to run directly on devices like Copilot+ PCs powered by Windows 11. Running models locally significantly reduces latency and improves privacy, as user data remains on the personal device. Microsoft released Fara-7B as an open-weight, experimental model under an MIT license to invite community feedback and experimentation.
Interacting with the Digital World Visually
Fara-7B operates using visual perception, simulating how a human interacts with a screen. The model visually processes a webpage screenshot and directly predicts coordinates for actions like scrolling, typing, or clicking. Crucially, it does not rely on separate models to parse the screen or access auxiliary information like accessibility trees. It uses the same visual modalities as a human to interact with the computer.
To train Fara-7B, Microsoft developed a novel synthetic data generation pipeline. This system, built on the Magentic-One framework, bypasses expensive manual annotation by using a multi-agent system to propose, solve, and verify tasks on real web pages. The resulting training dataset contained 145,000 trajectories and 1 million steps, covering various task types, websites, and difficulty levels. This complexity of a multi-agent solving system was then distilled into the single, smaller Fara-7B model using supervised finetuning.
The model uses the Qwen2.5-VL-7B model as its base due to Qwen's capability in grounding tasks and long context support. During execution, Fara-7B outputs a reasoning message, followed by a tool call for standard actions like click(x,y) or type(), or macro-actions like web_search() and visit_url().
Performance and accessibility
Fara-7B demonstrates strong performance against larger, more resource-intensive systems. On the WebVoyager benchmark, Fara-7B achieved a task success rate of 73.5%, outperforming larger models like the GPT-4o-based Set-Of-Marks (SoM) Agent and the OpenAI computer-use-preview model. Fara-7B also showed strong results on Microsoft's new WebTailBench, a set of evaluations covering real-world, underrepresented tasks like finding job postings and comparing prices.
Despite its compact size, Fara-7B completes tasks with significantly fewer steps than comparable models. For instance, on the WebVoyager benchmark, Fara-7B averaged 16 steps per task compared to approximately 41 steps for the UI-TARS-1.5-7B model, making it computationally more efficient. The model is now available on Microsoft Foundry and Hugging Face for public use. Microsoft is also sharing a quantized version optimized to run directly on Copilot+ PCs with Windows 11, utilizing NPU hardware acceleration for turnkey experimentation.
Designing for safety and control
Agents capable of operating computers pose safety challenges distinct from chat-only models, including the risk of misuse, misbehavdqior, and unintended real-world consequences. Microsoft built transparency and user control into Fara-7B's design:
Data Collection: Fara-7B only processes browser screenshots, instructions, and action history necessary for the task, relying solely on what is visually on the screen.
User Oversight: All actions are logged and auditable. Fara-7B is intended to run in a sandboxed environment, allowing users full oversight and the ability to intervene or halt actions instantly.
Misuse Mitigation: The model was trained on safety data and underwent Microsoft’s rigorous red teaming process. It achieved a high refusal rate of 82% on the WebTailBench-Refusals dataset, which consists of tasks involving harmful content or risky actions.
Critical Points: Fara-7B's training specifically mandates that it must recognize and stop at "Critical Points"—situations requiring personal data, user consent, or an irreversible transaction (like sending an email or completing a purchase). At these points, the agent must inform the user it cannot proceed without explicit consent.
By making Fara-7B open-weight, Microsoft aims to accelerate community experimentation with CUA technology for automating routine web tasks such as shopping, booking reservations, and managing accounts.
/dq/media/agency_attachments/UPxQAOdkwhCk8EYzqyvs.png)
Follow Us