Microsoft AI introduces Magentic-UI: an open source proxy prototype that works with people to complete complex tasks that require multi-step planning and browser use

0 0 5 minutes read

Microsoft AI introduces Magentic-UI: an open source proxy prototype that works with people to complete complex tasks that require multi-step planning and browser use

Modern web usage covers many digital interactions, from filling out forms and managing accounts to performing data queries and navigating complex dashboards. Although networks are deeply intertwined with productivity and workflows, many of these actions still require repetitive human input. This is especially true for environments where detailed descriptions or decisions that require detailed descriptions or mere searches. Although AI agents have emerged to support task automation, many have prioritized full autonomy. However, this is often placed where the user controls it leads to a different result than the user expects. The next leap of productivity-enhanced AI involves proxying not replacing users but working with them to fuse automation with continuous real-time human input for more accurate and trustworthy results.

The main challenge in deploying AI agents for web-based tasks is the lack of visibility and interventions. Users often don’t see the steps of the proxy plan, how to execute them or when they are off track. In scenarios involving complex decision-making, such as entering payment information, interpreting dynamic content, or running scripts, users need mechanisms to step in and redirect the process. Without these features, the system may cause irreversible errors or risk inconsistent with user goals. This highlights the important limitations of current AI automation: unstructured humans in circular design, where users dynamically direct and supervise agent behavior, not just as audience actions.

Previous solutions exposed to Web automation through rule-based scripts or a common AI proxy powered by language models. These systems interpret user commands and try to execute them automatically. However, they often execute plans without surfaced decisions or allow meaningful user feedback. Some interactions that provide similar commands are inaccessible to ordinary users and rarely include hierarchical security mechanisms. Furthermore, minimal support for task reuse or performance learning across meetings limits long-term value. These systems also tend to lack adaptability when context changes or errors must be corrected in collaboration.

Microsoft researchers introduced MagentaThis is an open source prototype that emphasizes collaborative human interactions for web-based tasks. Unlike systems that were previously designed to be completely independent, the tool facilitates real-time co-planning, execution sharing and step-by-step user supervision. Magentic-UI is built on Microsoft’s Autogen framework and is tightly integrated with Azure AI Foundry Labs. This is a direct evolution from the previously introduced magenta system. By launching, Microsoft Research aims to address fundamental issues about human supervision, security mechanisms, and learning by providing researchers and developers with an experimental platform.

Magentic-UI includes four core interactive features: common planning, common tasks, action guardians and planned learning. Co-planning allows users to view and adjust the steps proposed by the agent before execution, thus fully controlling the role of AI. Common tasks can be visibility in real time during operation, allowing users to pause, edit, or take over specific operations. Action Guard is a customizable confirmation of high-risk activities, such as closing the browser tab or clicking the Submit form, which can have unexpected consequences. Planning learning allows Magentic-UI to remember and refine the steps of future tasks, thus improving through experience over time. These features are supported by a modular team of agents: orchestrators lead plans and decisions, Weburfer handles browser interactions, encoders execute code in a sandbox, and FileSurfer interprets files and data.

Technically, when a user submits a request, the orchestration agent generates a step-by-step plan. Users can modify it through the graphical interface through the edit, delete or regeneration steps. Once finalized, the program will be delegated among professional agents. Each agent reports the report after performing its tasks, and the orchestrator determines whether to continue, repeat or request user feedback. All operations are visible on the interface and the user can stop performing at any time. This architecture not only ensures transparency, but also allows for adaptive task flow. For example, if the step fails due to a disconnection of the link, the orchestrator can dynamically adjust the plan with user consent.

In a controlled evaluation using Gaia benchmarks, including complex tasks such as browsing the web and interpreting documents, Magentic-UI’s performance is rigorously tested. Gaia consists of 162 tasks that require multimodal understanding. When operating independently, Magentic-UI successfully completed 30.3% of the tasks. However, when other task information is supported by simulated users, it successfully jumped to 51.9%, an increase of 71%. Another configuration using smart emulation users will increase the rate to 42.6%. Interestingly, Magentic-UI only asked for help in 10% of the augmented tasks and in 18% of the cases required the final answer. In this case, the system requires an average of 1.1 help. This shows how minimal but appropriate human interventions can significantly improve task completion without a high level of supervision costs.

Magentic-UI also has a “Save Plan” gallery that demonstrates strategies to be reused from past tasks. Searching from this gallery is three times faster than creating a new plan. When user inputs, a prediction mechanism manifests these plans, simplifying repetitive tasks such as flight searches or form submissions. The security mechanism is very strong. Each browser or code action runs in the Docker container to ensure that no user credentials are used. Users can define allow lists for website access, and each action can be closed after an approval prompt. The Red Line team’s evaluation further tested it against phishing attacks and rapid injections, and the system either sought user clarification or blocked execution, strengthening its layered defense model.

Several key points about Magentic-UI research:

With simple human input, Magentic-UI completes the task 71% (from 30.3% to 51.9%).
Requesting users to provide only 10% help in each task’s enhanced task and average 1.1 help request.
It has a shared plan UI that allows complete user control before execution.
Execute tasks through four modular agents: orchestration, WebSurfer, Coder, and Filesurfer.
Store and reuse schedules to reduce duplicate tasks latency to 3 times.
All operations are made into sandboxes through Docker containers; user credentials have never been disclosed.
A redline assessment was passed for phishing and injection threats.
Supports fully user-configurable “action guards” for high-risk steps.
Fully open source and integrated with Azure AI Foundry Labs.

In short, Magentic-UI solves a long-term problem in AI automation, lacking transparency and controllability. Instead of replacing users, it enables them to keep the core of the process. Even with minimal help, the system performs well and learns to improve every time. Modular design, powerful safeguards and detailed interaction models lay a solid foundation for future smart assistants.

View technical details and GitHub pages. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.