Claude Code in SHAMBLES (Qwen3 Coder Tested)

Introduction & Model Overview 00:00

  • Qwen 3, an open-source coding model from Alibaba, is featured and put to various tests.
  • The video is sponsored by Together AI, which provides access to open-source AI models.

Coding Challenges & Simulation Tests 00:14

  • Qwen 3 successfully generated code for a 2D Navier-Stokes fluid dynamics simulation using a simple HTML/JavaScript prompt.
  • The model produced a visually appealing simulation, with minor interactive elements.
  • Created a 3D physics simulation with 5 bouncing spheres inside a dodecahedron using 3JS and CannonES as requested; minor flaws in collision handling were observed, but the basic functionality met the prompt.

Reasoning and Spatial Logic Testing 03:06

  • Qwen 3 was tasked with describing and simulating complex 3D cube rotations.
  • The model generated correct code for visualization but failed in spatial reasoning—the stated axis rotations did not match the simulated movements.

Context Window and Retrieval Abilities 04:28

  • Qwen 3 has a native 256,000-token context window (stretchable to one million tokens).
  • It found a hidden password within the entire Harry Potter book promptly, passing the “needle in a haystack” retrieval test.

Censorship and Bias Evaluation 05:14

  • When asked about Tiananmen Square, Qwen 3 responded with neutral, state-approved information and warnings about discussing sensitive topics, indicating censorship.
  • On political questions (Trump vs. Kamala Harris), Qwen 3 gave balanced, non-committal, and unbiased answers; it refused to take a clear stance even when pressed, showing neutrality.

Together AI Platform & Model Integration 07:19

  • Together AI offers affordable, serverless endpoints for various open-source models, including Qwen 3 and Kimmy K2, with OpenAI-compatible APIs.
  • Qwen Code (an open-source Claude Code alternative) works well with Qwen 3; simple installation and configuration are demonstrated using npm and environment variables.

Safety, Ethics, and Medical Capability 09:07

  • When presented with a scenario about making a drastic life decision, Qwen 3 showed empathy, encouraged reflection, and discussed consequences, rather than validating the plan.
  • The model refused to provide assistance for illegal activities (e.g., hotwiring a car).
  • It gave an accurate medical diagnosis (acute anterior myocardial infarction) and management plan for a simulated patient scenario, demonstrating medical competence.

Moral Reasoning and Tricky Questions 12:24

  • Qwen 3 handled the classic trolley problem by outlining utilitarian and deontological perspectives, then preferred the utilitarian option (pull the lever).
  • In a hand-tracing computer vision task, the model generated mostly functional Python OpenCV code, although there were mirror-image discrepancies in hand position tracking.

Reasoning Traces & Gotcha Questions 13:57

  • Qwen 3 displayed explicit reasoning steps, even though it's not categorized as a reasoning model.
  • Accurately counted and reasoned through the number of 'R's in "strawberry" despite multiple unnecessary checks.
  • Correctly counted the words in its own response to a meta prompt, but failed in identifying the third word per prompt instructions.

Conclusion & Sponsor Reminder 15:15

  • Together AI is reiterated as the technology sponsor enabling the showcased experiments.
  • The video closes by inviting viewers to like and subscribe for more AI model testing content.