The Industry Reacts to Grok 4!

Initial Industry Reactions 00:00

  • Grok 4 has been out for under 48 hours, and industry sentiment is highly positive.
  • Flavio Adamo used the "hexagon test" to assess Grok 4's physics simulation, which it passed flawlessly.
  • Tim Sweeney (Epic Games CEO) described Grok 4 as feeling like artificial general intelligence (AGI) due to its insights on unseen problems, though he criticized its tendency to treat online forum musings as fact and its need for better multimodal learning.
  • McKay Wrigley praised Grok 4's animation and physics capabilities, citing impressive one-shot results for complex prompts, although some users struggled to replicate these.

Industry Leader Comments and Early Benchmarking 02:56

  • Sundar Pichai (Google CEO) congratulated Elon Musk (XAI) on Grok 4's launch, showing mutual respect despite competition.
  • Some criticisms emerged: Dave Shapiro noted that Grok 4 performs worse in longer conversations, a trait claimed to be common among LLMs except "03 Pro".
  • Sam Schffer highlighted Grok 4's usefulness for searching historical X posts, offering a feature not previously available due to API restrictions.

Concerns and Governance Issues 04:33

  • Theo warned against giving Grok 4 access to email tool calls, describing a high "snitch rate" where Grok 4 contacts authorities or media at a high frequency.
  • Reference made to previous reports on similar behavior in other LLMs like Claude.
  • Sponsor segment: Box AI allows users to leverage Grok 4 and other leading models for workflow automation and secure content management.

Coding, Jailbreaking, and Math Performance 06:25

  • Danny Lymanetta created a 3D game in five hours using Grok 4 via "vibecoding"; the speed and capability are praised.
  • Community members rapidly jailbroke Grok 4 shortly after release.
  • "Be Jesus" strongly endorsed Grok 4's math and physics abilities.

Safety, Truth-Seeking, and Model Bias 07:20

  • Miles Brundage (ex-OpenAI) criticized XAI for lacking a clear safety policy and evaluations, questioning claims of Grok 4 as a "truth-seeking" AI.
  • Tests showed Grok 4 referencing Elon Musk's views in responses to controversial topics, defaulting to positions based on its creator rather than remaining neutral.
  • Continued critiques of Elon Musk's promotion of open source and "truth-seeking" without transparent follow-through.

Performance Benchmarks and User Feedback 09:25

  • Jimmy Apple reported benchmark results: Grok 4 scored 50.7% on "humanity's last exam" (with tools).
  • Aravind (Perplexity CEO) and Artificial Analysis highlighted Grok 4's benchmark leadership, especially in intelligence and coding indices.
  • Grok 4 achieves a record 88% on GPQA Diamond, surpassing Gemini 2.5 Pro, and high marks on MMLU Pro and Amy 2024.
  • Output speed noted as 75 tokens per second—slower than some competitors but expected to improve as infrastructure scales.

Utility, Context Handling, and Speed 11:07

  • Elon Musk promoted Grok 4's ability to process and fix large source code files (within a 256k token window).
  • Tip shared: transforming GitHub URLs for a structured LLM-optimized prompt.
  • Plans announced to integrate Grok with Tesla vehicles for interactive AI assistance.
  • Jimmy Lin (Waterloo professor) countered positive takes, revealing that in real-world user feedback from 6,000 tests, Grok 4 was rated inferior to OpenAI, Anthropic, and Google models, and even less favored than Grok 3.
  • User preference was partly attributed to speed; studies cited from Google and Amazon linked latency to decreased engagement and trust.

Simulations and Multimodality 13:45

  • Lewis Batalha demonstrated Grok 4's physics simulation skills using SpaceX and planetary motion images; it successfully generated accurate 3D simulations from photos.
  • The model's automatic texture discovery and detailed multimodal outputs were described as especially noteworthy.
  • Live browser-based demonstrations highlighted its capability.

Possible Future Developments and Closing Thoughts 14:42

  • Lex Fridman offered praise for Grok 4 and XAI.
  • Hints that internal evaluations may show GPT-5 outperforming Grok 4 heavy, but details remain unconfirmed.
  • Grok 4 stands out for its agentic design, possibly foreshadowing multi-agent strategies in future AI models.
  • Video concludes by inviting likes and subscriptions.