Everything you need to know about GPT-5 (+ mini and nano)

Introduction & Sponsor 00:00

  • GPT-5 has been released, accompanied by additional information about its benchmarks, pricing, model options, supported tools, and implications for developers.
  • The presenter references a prior video where they had early access to the model and have now transitioned to paying for usage.
  • Daytona is featured as the sponsor, offering inexpensive, stateful AI infrastructure with robust SDKs for deploying and managing AI agents.

Livestream and First Impressions 02:30

  • The official GPT-5 launch livestream was criticized for poor presentation and confusing data visuals.
  • Despite underwhelming presentation, the model itself is a significant leap, not just a minor improvement.

Pricing & Model Cost Comparison 03:40

  • GPT-5 pricing is $1.25 per million input tokens and $10 per million output tokens.
  • Previous models like GPT-3.5 and GPT-4 had higher rates: GPT-3.5 was $10 in/$40 out, GPT-4 was $15 in/$60 out.
  • Claude Opus and Gemini models are priced significantly higher than GPT-5.
  • Real-world costs can differ from token prices, as token generation efficiency varies by model.
  • Benchmark tests show GPT-5 is much cheaper per run compared to Grok 4 and offers better efficiency.

Other Model Options: Mini and Nano 05:46

  • GPT-5 Mini is priced at $0.25 per million input tokens and $2 per million output tokens, cheaper and smarter than Gemini 2.5 Flash.
  • GPT-5 Nano is $0.05 per million in and $0.40 per million out, making it attractive for cost-sensitive tasks.
  • Token caching offers a 90% input cost discount for reused tokens.
  • Bulk pricing options are available, reducing large-scale operation costs further.
  • Compared to Gemini 2.5 Pro and Flash, GPT-5 offerings are more competitive, especially at larger context sizes.

Context Length & Cutoff Date 07:11

  • GPT-5 supports a 400,000-token context window for input and can output up to 128,000 tokens per request.
  • The official data cutoff is September 30, 2023.

Access, User Experience, & Features 07:49

  • GPT-5 is available on T3 Chat under the standard tier, with Mini and Nano available in the free tier and the reasoning (thinking) model in premium.
  • New interface features, like an option to skip the longer "thinking" phase for a quicker answer, are being introduced on chatgpt.com.

Unified System & Model Routing 09:47

  • GPT-5 uses a unified system that routes queries to different models (smart/fast, reasoning, etc.) based on complexity, tool use, and prompt intent.
  • This dynamic routing is based on real-time data, including user preferences and actions.
  • The system resembles mixture-of-experts designs but operates with higher-level model routing.
  • The presenter notes OpenAI's leadership in this routing and recommendation system innovation.

Benchmarks & Real-World Performance 11:38

  • On Skatebench and other independent benchmarks, GPT-5 and GPT-5 Mini scored exceptionally well, outperforming previous mini models.
  • GPT-5 combines high performance with significantly lower costs compared to Grok 4, GPT-3.5, and GPT-4.
  • The model generates fewer tokens on average, contributing to cost efficiency.
  • Benchmarks confirm superior instruction-following abilities.

Model Tiers & Replacements for Older Models 12:38

  • OpenAI provided guidance for which GPT-5 variants should replace existing models (e.g., GPT-4, 3.5, etc.) for different use cases.
  • GPT-5 Main replaces GPT-4, while GPT-5 Mini and Nano replace earlier mini and flash models.
  • The trainer describes improved data filtering and safety in the new models, with less regurgitation of content and more summarization and abstraction.

Safety, Alignment, and Refusals 14:41

  • GPT-5 is trained with advanced safety techniques focused on safe completions instead of binary refusals, especially for dual-use prompts (e.g., biology, cybersecurity).
  • Agentic alignment tests (e.g., blackmail avoidance, lethal intent avoidance) show GPT-5 scores zero harmful actions, outperforming previous models.
  • Disallowed content and sycophancy (overly agreeable or enabling responses) have been significantly reduced, addressing past controversies with GPT-4.
  • New instruction hierarchy ensures that system, developer, and user prompts are prioritized for safety and reliability.

Hallucinations, Deception, & Reliability 19:45

  • GPT-5 significantly reduces hallucinated/incorrect information: "thinking" mode is under 5% major incorrect claims versus over 20% in GPT-3 and GPT-4.
  • Deception rates are also much lower: 10x lower in missing image tests, 6x lower in broken tool testing, and 2-3x lower in code deception.
  • The model is also improved in health advice accuracy and support for other languages.

Real-World Hacker & Security Community Feedback 21:16

  • Security professionals are impressed with GPT-5's capabilities, especially in reverse engineering and finding obscure information.
  • Model performance in real-world tough questions shows significant progress over previous AIs.

Long Context Reasoning and Benchmarks 22:02

  • GPT-5 occupies top positions in long-context reasoning benchmarks and can control token usage/cost by choosing minimal, low, medium, or high-effort variants.
  • The model scales effort and intelligence according to the needs and cost constraints of the user.

Coding, UI Generation, & SVG Tasks 24:40

  • GPT-5's performance in generating UIs from screenshots is still developing; results are functional but not accurately representative.
  • For SVG generation tasks (like "Pelican riding a bike" benchmarks), GPT-5, Mini, and Nano produce competent results, surpassing many other models.

Final Impressions & Recommendations 25:55

  • GPT-5 is described as a groundbreaking model that is cost-effective, highly competent, and reliable for a range of tasks.
  • The presenter highlights they no longer feel the need to constantly switch models, as GPT-5 consistently delivers strong results.
  • Public access is available through platforms like T3 Chat, ChatGPT, and Cursor, with limited free access during launch week.