Introduction and Demonstration 00:00
- ChatGPT Agent is a new feature combining OpenAI's operator, Deep Research, and ChatGPT.
- The tool integrates research, data retrieval, and web browsing capabilities.
- Demonstrations include searching for a dog-friendly campsite with a hot tub and organizing vegetarian recipes by protein efficiency.
- The agent interface is streamlined, showing reasoning and actions, with the ability to view all intermediate steps taken.
- Supports tasks such as navigating websites and creating spreadsheets.
Technical Details and System Overview 02:40
- The ChatGPT Agent operates as a unified agentic system, combining web interaction, deep information synthesis, and chatbot fluency.
- Utilizes its own virtual computer similar to Manis, an existing agentic tool.
- Available to users with Pro, Plus, and Team accounts, not requiring the highest-tier subscription.
- Positioned as a competitor to similar agentic systems and compared to the new Grok 4 AI model.
Benchmarks and Performance 03:36
- On a comprehensive "humanity's last exam" benchmark, ChatGPT Agent (browser+computer+terminal) scored 41.6%, outperforming Deep Research alone (26%) and OpenAI 03 with Python+browsing (24.9%).
- Grok 4 heavy scored higher at 44.4%; Grok 4 scored 38.6%.
- The ChatGPT Agent's performance increases significantly when tool use is enabled.
- On Frontier Math, ChatGPT Agent scored 27.4%, higher than 04 mini with Python (19.3%) and 03 with Python (10.3%).
- In economically important tasks, ChatGPT Agent wins against humans around 30+% of the time.
- On DS Bench (data science tasks), ChatGPT Agent outperforms humans in both data modeling (89.9% vs. 64.1%) and other data analysis tasks.
- On Spreadsheet Bench, humans achieved 71.3%, while the agent with Excel access scored 45.5%, showing room for improvement.
Risks, Limitations, and Final Thoughts 07:32
- This release is the first allowing ChatGPT to take web actions directly, introducing new security risks.
- There is potential for malicious actors to exploit agents and extract sensitive data; users should be wary of sharing personal information.
- The shift towards agent-mediated internet interaction means users may become more distant from raw web content, raising concerns about transparency and trust in AI filters.
- The presenter expresses mixed emotions, recognizing both the convenience of automation and the potential issues with relying on AI agents for web interaction.