SUMM

Tao (Hik), co-founder and Chief Product Officer of Manus AI, introduces himself as a long-time coder with 28 years of experience, but a newcomer to AI.
His initial goal was to create a product that could influence users 24 hours a day, and he believes Manus AI can achieve this by year's end.
Currently, the most active user consumes about two hours of GPU time daily.

Manus derives its name from the MIT motto "mens et manus" (mind and hand), emphasizing the fusion of intelligence and action.
Unlike other AI products, Manus focuses on giving AI "hands" (the ability to interact with the world and take actions) rather than just providing a smart "brain."

Internally, Manus assisted with global expansion tasks, such as searching for and recommending office locations and accommodations in Tokyo for 40 staff members.
Using a prompt, Manus autonomously planned and executed web searches, producing an interactive map and detailed office/accommodation reports within 24 minutes.
Additional demo shows Manus analyzing a photo of an empty room, identifying its style, browsing furniture websites, and composing a room design with direct purchase links.
Manus acts as a general agent capable of solving a wide variety of tasks autonomously.

Manus was inspired by the code editor "Cursor," particularly how non-coders used it to accomplish tasks without caring about code details.
The founders saw an opportunity to create a system that automates the "right panel" of Cursor, focusing on outcomes rather than process.
They wanted Manus to operate in the cloud, so users could delegate tasks and disengage until completion.

Each Manus agent is assigned a virtual machine with full computer capabilities (file system, terminal, VS Code, a real Chromium browser).
Users can upload large volumes of data (like hundreds of PDFs), and Manus processes and structures them automatically.
Manus is designed for consumers, with pre-integrated access to private databases and APIs for user convenience.
A "personal logic system" allows users to teach Manus personalized workflows and preferences, which Manus remembers and applies automatically.

Manus advocates for minimal hardcoded workflows and maximal reliance on the intelligence of the underlying AI models.
There are zero predefined workflows; Manus depends on providing context and allowing the model to reason and act.
This approach aims to unlock more emergent and flexible capabilities compared to conventional multi-agent systems with rigid roles.

Manus relies on Anthropic's Claude models for their capability in long-horizon planning and agentic “loops.”
Most competing models could only manage a few steps before ending prematurely; Claude handled extended, multi-step tasks required by Manus.
Effective tool usage and function calling are critical for Manus's agent, with custom mechanisms (like "coot injection") boosting performance before native model support was available.
Significant investment ($1 million on Claude in 14 days) demonstrates the scale of Manus's usage and commitment.

When Manus browses the web, it provides the foundational model with three types of context: text from the page viewport, a screenshot, and a screenshot with bounding boxes to guide interaction.
The approach blends vision and text processing for effective web interaction.

Facing rapid evolution of foundational models, the team sees speed of innovation and flexible agent frameworks as Manus's competitive edge, rather than reliance on any single technology or workflow.
Emergent capabilities, like deep research use cases, arise naturally from Manus’s structure with minimal manual engineering.

Manus will remain a cloud-based service, with no plans for a local, Docker-based version.
The focus is on reclaiming users' attention, allowing tasks to run remotely so users can disengage.
Future plans include expanding Manus's capabilities with virtual environments beyond Linux (e.g., Windows, Android), but keeping everything in the cloud.

Spotlight on Manus