Agents struggle with cross-repository navigation and often summarize or drop context as file limits are reached, reducing bug-detection capability.
Users should proactively manage agent context by supplying code diffs and ensuring key files remain within the context window.
Requesting component inventories from agents (e.g., index of classes, variables) can help the agent better understand and find bugs.
The Superiority and Limits of Thinking Models 05:30
“Thinking models” show greater success in identifying and following complex bug patterns within codebases.
These models follow thought traces and can dive deeper into code reasoning, uncovering more complex issues.
There is considerable run-to-run variability: even with thinking models, agents do not take a fully holistic view, producing inconsistent results across multiple runs.
Users often need to run agents many times for a comprehensive bug report, which remains a known limitation of current technology.