The new ChatGPT agent excels at multi-turn conversations and maintaining tasks over long periods.
A major focus is on improving agent memory and personalization, with future goals including proactive agent actions.
This episode features the OpenAI team behind the agent, discussing the leap in capabilities from unifying prior tools (deep research and operator) into a seamless system.
The agent combines text browsing, a GUI (visual) browser, terminal access (for code/data tasks), and shared tool state within a virtual computer.
It allows fluid transitions between reading text, interacting with web pages visually, executing code, and manipulating files or APIs (e.g., GitHub, Google Drive).
Users benefit from an environment similar to a real computer, offering flexibility and complex task execution.
Deep research and operator (previous separate products) were merged due to their complementary strengths—efficient text reading and web interaction for the former, advanced GUI handling for the latter.
Additional tools like a terminal and image generation were integrated, resulting in a powerful multi-functional agent.
Shared state among tools enables seamless switching and complex workflows.
Trained for tasks like generating detailed research reports, booking flights, making purchases, creating slide decks, and conducting data analysis.
The design is intentionally open-ended to discover unexpected user use cases.
Both consumer and business users are targeted; early users have used it for data organization, online shopping, coding, and synthesizing emerging research.
Example: The agent estimated OpenAI’s valuation, created a financial model and projections, assembled a spreadsheet, and generated slides—completing the task in about 28 minutes.
The agent’s access to a general virtual computer enables it to address a vast array of human-computer interaction tasks.
The team foresees new paradigms for interacting with virtual assistants and is focused on making the model adept at as many computer-based tasks as possible.
There’s considerable excitement about both expanding technical capabilities and exploring new user experiences with agents.