SUMM

Victor Dibia introduces himself as a Principal Research Software Engineer at Microsoft Research, focusing on human-AI experiences and multi-agent systems.
He has worked on projects such as GitHub Copilot and led development of Autogen and Autogen Studio, a framework and low-code tool for building multi-agent workflows.
Shares experience from working at CloudBear as a machine learning engineer and at IBM Research in human-computer interaction.
Describes early experiments building agentic workflows prior to ChatGPT, including a tool that used Da Vinci Codex models for data summarization and visualization with decreasing error rates as models improved.
Highlights the transition from single-agent, fixed workflows to exploring the design of multi-agent, self-organizing applications like Autogen.

Inspiration to create Blender LM stemmed from the complexity of learning Blender, which requires deep understanding and extensive time investment.
Explains the differences between fixed deterministic workflows (good for well-defined problems) and autonomous exploratory systems (better for complex, dynamic environments).
Identifies three key characteristics of semi-autonomous, multi-agent systems: autonomy, the ability to take actions with side-effects, and capability to handle long, complex, evolving tasks.

Demonstrates Blender LM, a web-based interface connected to Blender over a websocket to execute real-time 3D tasks via natural language.
Users can trigger actions like clearing the scene or creating objects (e.g., "two balls with a shiny glossy silver finish") and watch plans and execution steps streamed live.
System uses a planning agent to break down high-level instructions, a verification agent to check task progress, and streams activity and visualizations to the user interface.
Application exhibits autonomous decision-making and feedback loops to ensure tasks are being completed as intended.

Recommends first defining the goal and baseline function before introducing agents or AI.
Stresses the importance of building robust, testable tools (task-specific and general-purpose) as agents are only as capable as their tools.
Outlines the need for a thorough evaluation testbed: start with code in Jupyter notebooks, then develop interactive UIs, and add automated test suites.
Describes integrating various agent types: a base agent loop for action planning, a verifier agent for progress monitoring, and a planning agent for structured task decomposition.

Capability Discovery: Clearly communicate to users what the agent/system can do reliably, including proactive suggestions based on user context.
Observability and Provenance: Stream all activity logs, expose debugging tools, and provide real-time traces so users understand agent actions.
Interruptibility: Design systems to allow for pausing, rolling back, and resuming agents at any time, enabling intervention when agents go astray or use excessive resources.
Cost-Aware Delegation: Equip systems to estimate and prevent risky or costly operations, delegating decision authority back to users as needed.

Multi-agent systems should only be used when they offer a clear advantage, given increased complexity and higher error surfaces.
Most tasks do not benefit from a multi-agent approach; evaluate ROI carefully before choosing this architecture.
Offers a five-step framework to assess whether multi-agent systems are suitable, considering task planning, multi-persona breakdown, context management, and need for adaptive solutions.
Advocates for evaluation-driven design: start with the task definition, establish metrics, build a non-agent baseline, and iteratively measure the ROI of agent enhancements.
Emphasizes the importance of task-specific evaluation over generic academic benchmarks.

Reinforces four main UX design principles: enable task discovery, provide user-facing traces, ensure agent interruptibility, and implement cost/risk evaluation.
Advises against building full systems from scratch for demos, suggesting the use of frameworks to save effort.
Recommends further reading and shares that code and more insights (including those from an upcoming book) on Blender LM are available.

UX Design Principles for Semi Autonomous Multi Agent Systems — Victor Dibia, Microsoft