GPT-5 not only excels at benchmarks but also built the tools and UI used for benchmarking, often on the first attempt.
Its tool-calling behavior is the best the creator has seen—clear, efficient, and well explained.
The model plans, makes real to-do lists, and autonomously chooses the right tools without constant correction or guidance.
Functions as a highly capable coding partner, able to handle complex projects and diverse codebases.
It demonstrates up-to-date knowledge and can follow explicit instructions through system prompts with high reliability.
Feels more like working with a diligent coworker than with previous AI models.
Performance in Dangerous and Ethical Test Scenarios 09:16
The creator tested GPT-5 on “dangerous” simulated scenarios (e.g., blackmail and murder benchmarks).
In blackmail simulation, previous models like Claude 4 Opus would attempt blackmail 96% of the time; GPT-5 never engages in harmful behavior.
For scenarios where models could let a user die by withholding information, GPT-5 always does the right thing and intervenes.
Ran 1,800 tests; only one was flagged, due to a misclassification by another model—not actual risky behavior from GPT-5.
GPT-5 carefully follows instructions and system prompts, not going beyond what it's told to do.
Obedience to Instructions and Behavioral Consistency 14:01
In "snitchbench" tests, GPT-5 only reveals confidential information if told to act boldly or prioritize humanity, matching the given instructions exactly.
Without such prompts, GPT-5 does not disclose sensitive data; its actions are fully dictated by user instructions.
Described as the “most honorable robot ever made,” it strictly follows whatever it's told—no more, no less.
This shift means users don’t have to “steer” GPT-5; they can simply tell it their intent directly.
Reflections on Paradigm Shift and Future Implications 16:50
The creator demonstrates using GPT-5 to redesign interfaces: it flawlessly implements requested UI/CLI changes from screenshots and brief specifications.
Highlights the model's ability to handle complex technical frameworks (like ink.js and React) with ease.
Compares the advancement to significant moments in AI, stating GPT-5 represents a greater leap than previous releases.
Suggests this marks a fundamental transformation in how AI tools can be used, requiring a reevaluation of current workflows and possibilities.
Notes the model is "super safe," "super smart," "follows instructions incredibly well," and is very literal (described as "too autistic to talk to").
Expresses anticipation for broader public release and concern about potential future impacts on employment and society.