“Loop” is a new agent integrated into Brain Trust, aimed at automating significant parts of the eval process.
Loop leverages improvements in frontier AI models, specifically noting Claude 4 as a major breakthrough by performing almost six times better than previous top models.
The agent can automatically optimize prompts and handle complex agent setups, while also helping to create better datasets and scoring mechanisms.
Users can interact with Loop through the UI by previewing side-by-side edits to data, scorers, and prompts.
An optional setting allows Loop to fully automate optimizations for users who prefer less manual involvement.