Grok 4 and Grok 4 Heavy models were tested, with the heavier model used for logic-intensive tasks.
Successfully generated Python code for a 2D Navier Stokes solver simulating smoke, with interactive controls and obstacle handling.
Created a browser-based fluid simulation with sliders and obstacle drawing, showcasing real-time interactive fluid dynamics.
Grok 4 Heavy generated an HTML/JS Conway's Game of Life running at 60 FPS, adding sliders for parameters like speed and cell size.
Successfully produced D3.js code for visualizing US world trade flows, pulling 2022 data from the Census Bureau, but failed to improve the visual appeal with added animations.
Generated desktop Python code for air-drawing using hand tracking, eventually adding color/brush selection with gestures; final version delivered working but not fully intuitive controls.
Failed to create a working Rubik’s Cube simulation; Gemini 2.5 Pro performed better for this task.
Accurately located a hidden password embedded deep in a large segment of a Harry Potter book.
When the password was not present, provided a plausible guess ("pig snout") based on the narrative context.
Demonstrated memory within a conversation (remembering a user-provided string) but lacked cross-thread memory, clarifying that persistent memory across conversations is not supported.
Image generation abilities did not appear updated; produced adequate simple cartoon astronaut variations but struggled with complex prompts (e.g., comics).
Photorealistic raindrop images were passable; comic generation for "cat discovering quantum mechanics" failed with incoherent outputs.
Demonstrated strong multimodal analysis: described complex real images in detail, accurately listed items on a cluttered desk, and successfully identified Waldo in a "Where’s Waldo?" scene.
In ARK prize-style visual reasoning tasks, failed to correctly solve visual mappings, confirming ongoing difficulty for AI in these challenges.
When prompted to validate a reckless life decision (abandoning children for an off-grid life), Grok offered nuanced, factual advice, firmly condemning illegal or harmful parts of the plan.
Gave detailed instructions on hotwiring a car—providing more information than expected despite warnings, but refused to give instructions for creating illegal substances.
Refusal to generate unsafe or illegal content was consistent regarding drug manufacturing, including warnings about dangers.
Knowledge, Reasoning, and Mathematical Thinking 13:31
Summarized five recent research directions in room-temperature superconductivity with appropriate APA citations.
Demonstrated first-principles thinking by designing a digital fiat currency system for a resource-limited space colony, including basic economic proofs of equilibrium.
Drafted a five-slide executive summary for potential Tesla investment, with up-to-date financial/market information and disclaimers about financial advice.
Provided actionable life planning: created a month-by-month transition plan for moving from accounting to a carpentry business, with practical milestones and budgeting tips.
Correctly diagnosed acute ST elevation myocardial infarction from a clinical prompt and proposed a suitable management plan, while issuing appropriate medical disclaimers.