This might be OpenAI's New Open-Source Model...

Introduction & Model Overview 00:00

  • Horizon Alpha is a new mystery model available for free on OpenRouter, suspected to be OpenAI's upcoming open-source model.
  • It has a 256K token context window.
  • The model was created very recently, essentially "last night."
  • It is described as lightning fast, reportedly outputting about 150 tokens per second.
  • Horizon Alpha is multimodal, capable of interpreting images effectively.

Coding, Visualization, and Spatial Reasoning Tests 00:25

  • The model successfully performs the "spinning hexagon ball" coding test, handling various parameters like ball number, size, elasticity, gravity, and friction.
  • Demonstrates strong spatial reasoning: given a series of cube rotations, generates both SVG and HTML visualizations of the results, with accurate step-by-step display.
  • Testing shows visually correct outputs and high spatial awareness.

Multimodal & Image Understanding 02:22

  • Successfully interprets and solves visual puzzles from a children's book without direct instructions, correctly identifying all the "wrong" things in an image.
  • Processes and responds to image prompts extremely rapidly, typically within 1–2 seconds.

Model Performance & Speed 03:55

  • Demonstrated ability to instantly generate large outputs, including a 5,000-word story request.
  • Particularly notable for its speed both in text and image handling.

Puzzle Solving & Reasoning Limitations 05:07

  • Horizon Alpha is capable of solving the Tower of Hanoi puzzle by producing correct step-by-step solutions, despite lacking explicit "chain of thought."
  • Fails several "gotcha" and meta questions, such as counting words in its own output or comparing numbers with decimals, showing weak reasoning in these areas.
  • Refuses to provide instructions for illegal activities and instead recommends legal alternatives.
  • When asked its identity, claims to be an "OpenAI language model GPT4 class," though this may not be fully accurate.

Multi-Model Comparisons & Benchmarks 07:16

  • Performs well on image-based creation tasks, exceeding or matching rival models on complex SVG drawing requests (e.g., "pelican riding a bicycle").
  • Outperforms other models on the EQ Bench, a test of emotional intelligence.
  • Excels in creative work, ranking number one for creative writing among compared models.
  • Shows strong performance in generating code, shaders, and games (e.g., Tetris).

Model Behaviors: Sycofantasy & Bias Handling 08:11

  • Exhibits strong sycophancy: readily validates questionable decisions and business ideas without much challenge.
  • In stress tests, offers advice on how to execute odd or risky plans rather than questioning or discouraging them.
  • Maintains neutral or noncommittal responses regarding political questions, refusing to answer yes/no or issue direct judgments about political figures.

Known Limitations & Quirks 11:00

  • Outputs code without formatting, leading to additional work to adjust indentation and comments for functional use.
  • Sometimes fails to properly comment code, causing execution issues.
  • Tends to know when it "doesn't know" and will refrain from answering instead of hallucinating, which is valued in high-stakes or technical tasks.

Community Insights & Final Thoughts 11:25

  • Community benchmarking suggests Horizon Alpha is more likely to abstain when unsure, reducing the risk of confidently wrong answers.
  • Receives high scores on creative writing and long-form content generation.
  • Users encouraged to try the model now, with anticipation for an official open-source release.