Grok 4 Fully Tested (INSANE)

Coding and Simulation Tests 00:00

  • Grok 4 and Grok 4 Heavy models were tested, with the heavier model used for logic-intensive tasks.
  • Successfully generated Python code for a 2D Navier Stokes solver simulating smoke, with interactive controls and obstacle handling.
  • Created a browser-based fluid simulation with sliders and obstacle drawing, showcasing real-time interactive fluid dynamics.
  • Grok 4 Heavy generated an HTML/JS Conway's Game of Life running at 60 FPS, adding sliders for parameters like speed and cell size.
  • Successfully produced D3.js code for visualizing US world trade flows, pulling 2022 data from the Census Bureau, but failed to improve the visual appeal with added animations.
  • Generated desktop Python code for air-drawing using hand tracking, eventually adding color/brush selection with gestures; final version delivered working but not fully intuitive controls.
  • Failed to create a working Rubik’s Cube simulation; Gemini 2.5 Pro performed better for this task.

Context Window, Memory, and Deep Reading 08:05

  • Accurately located a hidden password embedded deep in a large segment of a Harry Potter book.
  • When the password was not present, provided a plausible guess ("pig snout") based on the narrative context.
  • Demonstrated memory within a conversation (remembering a user-provided string) but lacked cross-thread memory, clarifying that persistent memory across conversations is not supported.

Image Generation and Visual Comprehension 09:30

  • Image generation abilities did not appear updated; produced adequate simple cartoon astronaut variations but struggled with complex prompts (e.g., comics).
  • Photorealistic raindrop images were passable; comic generation for "cat discovering quantum mechanics" failed with incoherent outputs.
  • Demonstrated strong multimodal analysis: described complex real images in detail, accurately listed items on a cluttered desk, and successfully identified Waldo in a "Where’s Waldo?" scene.
  • In ARK prize-style visual reasoning tasks, failed to correctly solve visual mappings, confirming ongoing difficulty for AI in these challenges.

Ethics, Safety, and Moderation 10:19

  • When prompted to validate a reckless life decision (abandoning children for an off-grid life), Grok offered nuanced, factual advice, firmly condemning illegal or harmful parts of the plan.
  • Gave detailed instructions on hotwiring a car—providing more information than expected despite warnings, but refused to give instructions for creating illegal substances.
  • Refusal to generate unsafe or illegal content was consistent regarding drug manufacturing, including warnings about dangers.

Knowledge, Reasoning, and Mathematical Thinking 13:31

  • Summarized five recent research directions in room-temperature superconductivity with appropriate APA citations.
  • Demonstrated first-principles thinking by designing a digital fiat currency system for a resource-limited space colony, including basic economic proofs of equilibrium.
  • Passed spatial reasoning questions, correctly explaining cube orientation after successive axis rotations.
  • Appropriately handled mental shortcut and counting questions, including an accurate word count within its own output.
  • Showed strong performance in puzzle-solving (Tower of Hanoi) and visualization, producing correct moves and animated code visualizations.
  • Creative writing test ("cyberpunk noir scene") was successfully completed, delivering genre-appropriate prose.

Professional and Practical Applications 20:22

  • Drafted a five-slide executive summary for potential Tesla investment, with up-to-date financial/market information and disclaimers about financial advice.
  • Provided actionable life planning: created a month-by-month transition plan for moving from accounting to a carpentry business, with practical milestones and budgeting tips.

Medical and Diagnostic Reasoning 23:41

  • Correctly diagnosed acute ST elevation myocardial infarction from a clinical prompt and proposed a suitable management plan, while issuing appropriate medical disclaimers.

Wrap-Up and Follow-Up 25:55

  • The video concludes by welcoming viewer suggestions for additional tests of Grok 4's capabilities in future follow-ups.