SUMM

Horizon Alpha is a new mystery model available for free on OpenRouter, suspected to be OpenAI's upcoming open-source model.
It has a 256K token context window.
The model was created very recently, essentially "last night."
It is described as lightning fast, reportedly outputting about 150 tokens per second.
Horizon Alpha is multimodal, capable of interpreting images effectively.

The model successfully performs the "spinning hexagon ball" coding test, handling various parameters like ball number, size, elasticity, gravity, and friction.
Demonstrates strong spatial reasoning: given a series of cube rotations, generates both SVG and HTML visualizations of the results, with accurate step-by-step display.
Testing shows visually correct outputs and high spatial awareness.

Successfully interprets and solves visual puzzles from a children's book without direct instructions, correctly identifying all the "wrong" things in an image.
Processes and responds to image prompts extremely rapidly, typically within 1–2 seconds.

Demonstrated ability to instantly generate large outputs, including a 5,000-word story request.
Particularly notable for its speed both in text and image handling.

Horizon Alpha is capable of solving the Tower of Hanoi puzzle by producing correct step-by-step solutions, despite lacking explicit "chain of thought."
Fails several "gotcha" and meta questions, such as counting words in its own output or comparing numbers with decimals, showing weak reasoning in these areas.
Refuses to provide instructions for illegal activities and instead recommends legal alternatives.
When asked its identity, claims to be an "OpenAI language model GPT4 class," though this may not be fully accurate.

Performs well on image-based creation tasks, exceeding or matching rival models on complex SVG drawing requests (e.g., "pelican riding a bicycle").
Outperforms other models on the EQ Bench, a test of emotional intelligence.
Excels in creative work, ranking number one for creative writing among compared models.
Shows strong performance in generating code, shaders, and games (e.g., Tetris).

Exhibits strong sycophancy: readily validates questionable decisions and business ideas without much challenge.
In stress tests, offers advice on how to execute odd or risky plans rather than questioning or discouraging them.
Maintains neutral or noncommittal responses regarding political questions, refusing to answer yes/no or issue direct judgments about political figures.

Outputs code without formatting, leading to additional work to adjust indentation and comments for functional use.
Sometimes fails to properly comment code, causing execution issues.
Tends to know when it "doesn't know" and will refrain from answering instead of hallucinating, which is valued in high-stakes or technical tasks.

Community benchmarking suggests Horizon Alpha is more likely to abstain when unsure, reducing the risk of confidently wrong answers.
Receives high scores on creative writing and long-form content generation.
Users encouraged to try the model now, with anticipation for an official open-source release.

This might be OpenAI's New Open-Source Model...