Did gpt-5 just shadow drop? Horizon is the best code model ever

Introduction & First Impressions 00:00

  • Two new anonymous AI models, Horizon Alpha and Horizon Beta, appeared on Open Router and have demonstrated impressive capabilities, especially in UI and design tasks.
  • Horizon Alpha consistently outperformed Cloud4 Opus, a leading state-of-the-art model, in UI generation tests.
  • Both Horizon models produced visually superior and tasteful UI mockups even when compared to expensive, top-tier models.

Access, Testing, and Availability 02:27

  • Horizon models are anonymous; their origins and creators are unknown, and they are accessible only through Open Router.
  • Horizon Alpha was temporarily available but is now gone; Horizon Beta remains accessible but might also disappear soon.
  • Users are advised to try Horizon Beta quickly as these models are free for a limited time, but may train on user data.

Model Behavior & Capabilities 03:42

  • Horizon models are very fast in generating responses, averaging around 90–125 tokens per second.
  • They do not engage in explicit reasoning; responses start almost instantaneously, indicating little to no internal pre-answer deliberation.
  • Their capabilities are excellent for tasks involving SVG creation, UI, and code styling, surpassing most alternatives in these areas.
  • In tests like generating SVGs (e.g., a pelican on a bicycle or Star Wars characters), the models showed high spatial awareness and output quality.
  • Horizon models excel at producing high-quality gradients and visually appealing components with Tailwind CSS.

Model Identification Theories 04:22

  • Directly asking the models about their origins results in ambiguous or generic answers; Alpha sometimes references OpenAI, Beta remains noncommittal.
  • Tokenization analysis shows that Horizon Alpha’s token count matches Quen models, yet behavior and output suggest it might not be Quen.
  • Theories about whether Horizon is based on OpenAI architecture, Claude, or Quen remain unconfirmed; no verifiable information reveals the true source.

Specific Comparisons & Benchmarks 06:13

  • In UI/styling and SVG generation, Horizon outperforms Kimmy K2, Cloud Force Sonnet, and even Cloud4 Opus on subjective and visual quality.
  • When tested on trivia (like skateboarding history), Horizon provides solid answers, but its accuracy and depth place it slightly below GPT-4's capabilities.
  • On the custom "Skate Bench" benchmark, Horizon Alpha and Beta score around 20%, between Gro 3 Mini and GPT40.
  • In programming tool usage and planning, Horizon models eagerly utilize to-do list features and tool integrations early in their workflows.

Non-Reasoning & Speed Features 09:07

  • Horizon models prefer to plan tasks actively; they often start with a to-do list and describe their process, a novel behavior in this context.
  • Their average generation speed is around 110 tokens per second.
  • Despite excellent subjective performance, Horizon models do not score highly on standard benchmarks, particularly in math, where they underperform relative to models like Llama 4 Maverick.

Benchmarks vs. Real-Life Use 17:14

  • Standard AI benchmarks do not capture Horizon’s practical strengths, especially in coding and UI work.
  • Although it underperforms in quantitative benchmarks, the real-world quality and usefulness of its code and design outputs are outstanding.

Anonymity, Other Drops, and Speculation 18:53

  • Other anonymous models (e.g., Lobster on Elmarina) display similar style and performance to Horizon, suggesting they may be related or the same.
  • Horizon output features, like distinctive gradient styles, appear to be characteristic traits.
  • Speculation is that these anonymous releases are from a major lab seeking qualitative feedback or iterative improvements.
  • Some theorize Alpha was a smaller parameter model (~20B) and Beta a higher one (~120B), but this remains unverified.

Recommendations & Closing Thoughts 20:04

  • Horizon Beta is currently free and yields results surpassing even hand-crafted designs for AI image generation studios.
  • Users should experiment with Horizon soon, as free access could end abruptly.
  • The underlying providers are likely collecting user interaction data to refine their models.
  • The easiest public way to test Horizon’s capabilities is via T3 Chat or other tools supporting Open Router, though some integration workarounds may be required.
  • The creator is genuinely impressed with Horizon’s quality, recommends quick exploration, and anticipates further industry developments and revelations about these anonymous models.