SUMM

ChatGPT Agent is a new feature combining OpenAI's operator, Deep Research, and ChatGPT.
The tool integrates research, data retrieval, and web browsing capabilities.
Demonstrations include searching for a dog-friendly campsite with a hot tub and organizing vegetarian recipes by protein efficiency.
The agent interface is streamlined, showing reasoning and actions, with the ability to view all intermediate steps taken.
Supports tasks such as navigating websites and creating spreadsheets.

The ChatGPT Agent operates as a unified agentic system, combining web interaction, deep information synthesis, and chatbot fluency.
Utilizes its own virtual computer similar to Manis, an existing agentic tool.
Available to users with Pro, Plus, and Team accounts, not requiring the highest-tier subscription.
Positioned as a competitor to similar agentic systems and compared to the new Grok 4 AI model.

On a comprehensive "humanity's last exam" benchmark, ChatGPT Agent (browser+computer+terminal) scored 41.6%, outperforming Deep Research alone (26%) and OpenAI 03 with Python+browsing (24.9%).
Grok 4 heavy scored higher at 44.4%; Grok 4 scored 38.6%.
The ChatGPT Agent's performance increases significantly when tool use is enabled.
On Frontier Math, ChatGPT Agent scored 27.4%, higher than 04 mini with Python (19.3%) and 03 with Python (10.3%).
In economically important tasks, ChatGPT Agent wins against humans around 30+% of the time.
On DS Bench (data science tasks), ChatGPT Agent outperforms humans in both data modeling (89.9% vs. 64.1%) and other data analysis tasks.
On Spreadsheet Bench, humans achieved 71.3%, while the agent with Excel access scored 45.5%, showing room for improvement.

This release is the first allowing ChatGPT to take web actions directly, introducing new security risks.
There is potential for malicious actors to exploit agents and extract sensitive data; users should be wary of sharing personal information.
The shift towards agent-mediated internet interaction means users may become more distant from raw web content, raising concerns about transparency and trust in AI filters.
The presenter expresses mixed emotions, recognizing both the convenience of automation and the potential issues with relying on AI agents for web interaction.

ChatGPT Agent is wild...