ChatGPT Agent is wild...

Introduction and Demonstration 00:00

  • ChatGPT Agent is a new feature combining OpenAI's operator, Deep Research, and ChatGPT.
  • The tool integrates research, data retrieval, and web browsing capabilities.
  • Demonstrations include searching for a dog-friendly campsite with a hot tub and organizing vegetarian recipes by protein efficiency.
  • The agent interface is streamlined, showing reasoning and actions, with the ability to view all intermediate steps taken.
  • Supports tasks such as navigating websites and creating spreadsheets.

Technical Details and System Overview 02:40

  • The ChatGPT Agent operates as a unified agentic system, combining web interaction, deep information synthesis, and chatbot fluency.
  • Utilizes its own virtual computer similar to Manis, an existing agentic tool.
  • Available to users with Pro, Plus, and Team accounts, not requiring the highest-tier subscription.
  • Positioned as a competitor to similar agentic systems and compared to the new Grok 4 AI model.

Benchmarks and Performance 03:36

  • On a comprehensive "humanity's last exam" benchmark, ChatGPT Agent (browser+computer+terminal) scored 41.6%, outperforming Deep Research alone (26%) and OpenAI 03 with Python+browsing (24.9%).
  • Grok 4 heavy scored higher at 44.4%; Grok 4 scored 38.6%.
  • The ChatGPT Agent's performance increases significantly when tool use is enabled.
  • On Frontier Math, ChatGPT Agent scored 27.4%, higher than 04 mini with Python (19.3%) and 03 with Python (10.3%).
  • In economically important tasks, ChatGPT Agent wins against humans around 30+% of the time.
  • On DS Bench (data science tasks), ChatGPT Agent outperforms humans in both data modeling (89.9% vs. 64.1%) and other data analysis tasks.
  • On Spreadsheet Bench, humans achieved 71.3%, while the agent with Excel access scored 45.5%, showing room for improvement.

Risks, Limitations, and Final Thoughts 07:32

  • This release is the first allowing ChatGPT to take web actions directly, introducing new security risks.
  • There is potential for malicious actors to exploit agents and extract sensitive data; users should be wary of sharing personal information.
  • The shift towards agent-mediated internet interaction means users may become more distant from raw web content, raising concerns about transparency and trust in AI filters.
  • The presenter expresses mixed emotions, recognizing both the convenience of automation and the potential issues with relying on AI agents for web interaction.