SUMM

Two new anonymous AI models, Horizon Alpha and Horizon Beta, appeared on Open Router and have demonstrated impressive capabilities, especially in UI and design tasks.
Horizon Alpha consistently outperformed Cloud4 Opus, a leading state-of-the-art model, in UI generation tests.
Both Horizon models produced visually superior and tasteful UI mockups even when compared to expensive, top-tier models.

Horizon models are anonymous; their origins and creators are unknown, and they are accessible only through Open Router.
Horizon Alpha was temporarily available but is now gone; Horizon Beta remains accessible but might also disappear soon.
Users are advised to try Horizon Beta quickly as these models are free for a limited time, but may train on user data.

Horizon models are very fast in generating responses, averaging around 90–125 tokens per second.
They do not engage in explicit reasoning; responses start almost instantaneously, indicating little to no internal pre-answer deliberation.
Their capabilities are excellent for tasks involving SVG creation, UI, and code styling, surpassing most alternatives in these areas.
In tests like generating SVGs (e.g., a pelican on a bicycle or Star Wars characters), the models showed high spatial awareness and output quality.
Horizon models excel at producing high-quality gradients and visually appealing components with Tailwind CSS.

Directly asking the models about their origins results in ambiguous or generic answers; Alpha sometimes references OpenAI, Beta remains noncommittal.
Tokenization analysis shows that Horizon Alpha’s token count matches Quen models, yet behavior and output suggest it might not be Quen.
Theories about whether Horizon is based on OpenAI architecture, Claude, or Quen remain unconfirmed; no verifiable information reveals the true source.

In UI/styling and SVG generation, Horizon outperforms Kimmy K2, Cloud Force Sonnet, and even Cloud4 Opus on subjective and visual quality.
When tested on trivia (like skateboarding history), Horizon provides solid answers, but its accuracy and depth place it slightly below GPT-4's capabilities.
On the custom "Skate Bench" benchmark, Horizon Alpha and Beta score around 20%, between Gro 3 Mini and GPT40.
In programming tool usage and planning, Horizon models eagerly utilize to-do list features and tool integrations early in their workflows.

Horizon models prefer to plan tasks actively; they often start with a to-do list and describe their process, a novel behavior in this context.
Their average generation speed is around 110 tokens per second.
Despite excellent subjective performance, Horizon models do not score highly on standard benchmarks, particularly in math, where they underperform relative to models like Llama 4 Maverick.

Standard AI benchmarks do not capture Horizon’s practical strengths, especially in coding and UI work.
Although it underperforms in quantitative benchmarks, the real-world quality and usefulness of its code and design outputs are outstanding.

Other anonymous models (e.g., Lobster on Elmarina) display similar style and performance to Horizon, suggesting they may be related or the same.
Horizon output features, like distinctive gradient styles, appear to be characteristic traits.
Speculation is that these anonymous releases are from a major lab seeking qualitative feedback or iterative improvements.
Some theorize Alpha was a smaller parameter model (~20B) and Beta a higher one (~120B), but this remains unverified.

Horizon Beta is currently free and yields results surpassing even hand-crafted designs for AI image generation studios.
Users should experiment with Horizon soon, as free access could end abruptly.
The underlying providers are likely collecting user interaction data to refine their models.
The easiest public way to test Horizon’s capabilities is via T3 Chat or other tools supporting Open Router, though some integration workarounds may be required.
The creator is genuinely impressed with Horizon’s quality, recommends quick exploration, and anticipates further industry developments and revelations about these anonymous models.

Did gpt-5 just shadow drop? Horizon is the best code model ever