GPT 5 - What They Didn't Say

Presentation Style and Initial Impressions 00:00

  • The GPT-5 launch presentation by OpenAI was perceived as overly staged and awkward, failing to capture the natural style of previous live streams.
  • Minor details like Sam Altman’s footwear and presentation mishaps (e.g., benchmark slides with numerical inconsistencies) were highlighted and criticized by the community.
  • The critique includes that slides might have benefited from being checked by their own models.

GPT-5 System Architecture and Routing 02:08

  • GPT-5 represents more of a system or ecosystem, not a single model; it uses a model router to allocate prompts to the best-fit sub-model based on context and complexity.
  • There is a distinction between "reasoning" and "non-reasoning" models within GPT-5, optimizing both speed and cost for different query types.
  • The router’s main purpose is to save costs, especially important for OpenAI’s large user base (around 700 million users).

Agentic Features and Tool Use 05:09

  • GPT-5 can perform agentic operations like testing and feeding back on its own code outputs, suggesting an internal use of tools (agentic loop) to improve results, especially useful for coding and math tasks.

Focus Areas and Post-Training 06:02

  • Creative writing, health advice, and code generation are emphasized as core improvement areas in this release.
  • OpenAI’s post-training workforce has expanded, with large teams specializing in specific verticals like health, code, and creative expression.
  • Commendation for improving health outputs despite legal risk concerns, recognizing demand for health advice and second opinions worldwide.

Benchmarking, Evaluation, and Comparisons 07:53

  • There is skepticism about the benchmarks used by OpenAI, noting some benchmarks are saturated and some results are selectively reported (e.g., omitting difficult test instances).
  • GPT-5’s benchmark performance is good but not leading; it lags behind some competitors (such as Grok 4) in areas like the ARC challenge.
  • Previous incidents where OpenAI’s reported results were later clarified due to contamination of pre-training data are mentioned.

Model Speed, Precision, and Cost 10:22

  • The system consists of several variants: main GPT-5 system (possibly with two+ internal models), GPT-5 Mini, GPT-5 Nano, and others.
  • Models are notably faster and less costly, possibly due to lower precision training (e.g., FP4) and efficient compute usage, although this is speculative.
  • Faster, cheaper models are seen as a significant advantage, especially for tasks like coding and running agentic tools.

Pricing and Context Window 13:08

  • Main model pricing: $1.25 per million tokens input, $10 per million output; optimized by routing most queries to the cheaper, non-reasoning model.
  • Context window supports up to 400,000 tokens in and 128,000 tokens out, enabling large-scale tasks like rewriting novels or reviewing large documents.
  • The knowledge cutoff is October of the previous year, indicating pre-training finished well in advance of release.

Model Limitations and Variants 15:14

  • GPT-5 does not support audio inputs or real-time capabilities at launch, though these may come in future versions.
  • Mini and Nano variants offer even lower prices and similar context windows but with earlier knowledge cutoffs and no audio or real-time support.
  • The pricing structure undercuts most competition (e.g., Claude Opus, Sonnet, Gemini 2.5 Pro) and is particularly aggressive with Mini and Nano variants.

Market Position and User Impact 17:08

  • GPT-5 is positioned to replace much of the GPT-4 family due to better performance and lower costs.
  • Some early users, such as the CEO of Cursor, called it the best coding model so far, though this may be influenced by pricing advantages.
  • There is user concern about the model router’s impact on reliability and consistency, especially for users wanting always-on "reasoning" mode.
  • The release is noted as less impressive than GPT-4’s debut, sparking debate on whether cost or raw performance is now more important for users.