UX Design Principles for Semi Autonomous Multi Agent Systems — Victor Dibia, Microsoft

Introduction and Background 00:01

  • Victor Dibia introduces himself as a Principal Research Software Engineer at Microsoft Research, focusing on human-AI experiences and multi-agent systems.
  • He has worked on projects such as GitHub Copilot and led development of Autogen and Autogen Studio, a framework and low-code tool for building multi-agent workflows.
  • Shares experience from working at CloudBear as a machine learning engineer and at IBM Research in human-computer interaction.
  • Describes early experiments building agentic workflows prior to ChatGPT, including a tool that used Da Vinci Codex models for data summarization and visualization with decreasing error rates as models improved.
  • Highlights the transition from single-agent, fixed workflows to exploring the design of multi-agent, self-organizing applications like Autogen.

Motivation for Blender LM and Multi-Agent Systems 05:16

  • Inspiration to create Blender LM stemmed from the complexity of learning Blender, which requires deep understanding and extensive time investment.
  • Explains the differences between fixed deterministic workflows (good for well-defined problems) and autonomous exploratory systems (better for complex, dynamic environments).
  • Identifies three key characteristics of semi-autonomous, multi-agent systems: autonomy, the ability to take actions with side-effects, and capability to handle long, complex, evolving tasks.

Blender LM: Demo and Architecture 07:50

  • Demonstrates Blender LM, a web-based interface connected to Blender over a websocket to execute real-time 3D tasks via natural language.
  • Users can trigger actions like clearing the scene or creating objects (e.g., "two balls with a shiny glossy silver finish") and watch plans and execution steps streamed live.
  • System uses a planning agent to break down high-level instructions, a verification agent to check task progress, and streams activity and visualizations to the user interface.
  • Application exhibits autonomous decision-making and feedback loops to ensure tasks are being completed as intended.

Building Semi-Autonomous Multi-Agent Systems 10:10

  • Recommends first defining the goal and baseline function before introducing agents or AI.
  • Stresses the importance of building robust, testable tools (task-specific and general-purpose) as agents are only as capable as their tools.
  • Outlines the need for a thorough evaluation testbed: start with code in Jupyter notebooks, then develop interactive UIs, and add automated test suites.
  • Describes integrating various agent types: a base agent loop for action planning, a verifier agent for progress monitoring, and a planning agent for structured task decomposition.

UX Design Principles for Multi-Agent Systems 13:18

  • Capability Discovery: Clearly communicate to users what the agent/system can do reliably, including proactive suggestions based on user context.
  • Observability and Provenance: Stream all activity logs, expose debugging tools, and provide real-time traces so users understand agent actions.
  • Interruptibility: Design systems to allow for pausing, rolling back, and resuming agents at any time, enabling intervention when agents go astray or use excessive resources.
  • Cost-Aware Delegation: Equip systems to estimate and prevent risky or costly operations, delegating decision authority back to users as needed.

Key Takeaways and Practical Guidance 15:48

  • Multi-agent systems should only be used when they offer a clear advantage, given increased complexity and higher error surfaces.
  • Most tasks do not benefit from a multi-agent approach; evaluate ROI carefully before choosing this architecture.
  • Offers a five-step framework to assess whether multi-agent systems are suitable, considering task planning, multi-persona breakdown, context management, and need for adaptive solutions.
  • Advocates for evaluation-driven design: start with the task definition, establish metrics, build a non-agent baseline, and iteratively measure the ROI of agent enhancements.
  • Emphasizes the importance of task-specific evaluation over generic academic benchmarks.

Final Recommendations and Further Resources 19:06

  • Reinforces four main UX design principles: enable task discovery, provide user-facing traces, ensure agent interruptibility, and implement cost/risk evaluation.
  • Advises against building full systems from scratch for demos, suggesting the use of frameworks to save effort.
  • Recommends further reading and shares that code and more insights (including those from an upcoming book) on Blender LM are available.