AI capabilities in code are improving at an exceptionally fast pace, doubling in power roughly every 70 days—much faster than the typical seven-month doubling seen in general AI tasks
This rapid pace means AI agents can handle tasks that would have been considered significant just months ago, creating exponential growth in their usefulness for software engineering
About 18 months ago, AI tools were mostly limited to code completion, but now they are handling full tasks that previously required hours of human effort
Initially, the most reliable AI agent tasks were repetitive migrations (e.g., JavaScript to TypeScript conversions, framework upgrades), which suited deterministic, step-by-step instructions
Playbooks were built to allow stepwise instruction following, enabling these tedious but important migration tasks to be automated
Early improvements focused on enabling agents to remember and apply feedback across repeated tasks, implementing basic knowledge/memory systems
As agents improved, they began handling isolated bug fixes or feature additions akin to tasks given to interns—still contained to one or two files, but less strictly procedural
Enhancements were made for agents to set up repositories, run linters and CI, and manage project snapshots for safe, repeatable edits
The agent’s utility broadened from pure migrations to general development assistance
For more complex tasks, effective collaboration between humans and AI agents became crucial—humans needed to specify requirements more iteratively, beyond simple prompts
Features like “deep wiki” and advanced codebase search were developed, benefitting both agents and human users in understanding larger projects
Workflows evolved to be more interactive, with agents assisting during exploration and planning before autonomously executing work
The latest developments allow agents to handle multiple tasks simultaneously, effectively “killing the backlog” across entire projects or organizations
Integration with project management tools (e.g., Jira, Linear) enables the agent to scope work, determine when to seek human feedback, and choose the correct codebase targets for changes
Confidence estimation and decision-making abilities help agents know when to proceed autonomously versus when to ask for human input
The Critical Role of Testing and Next Frontiers 14:02
Self-testing and asynchronous testing became essential for agents executing large or complex tasks to ensure high-quality pull requests and autonomous improvement
Agents must run code, determine suitable test cases, and interpret results as part of an iterative improvement loop
The major long-term challenge transitions from individual task completion to operating at the project or even organizational level, encompassing broader planning and execution
The evolution of AI coding agents consists of a series of distinct, transformative 2x improvements every few months, each presenting new challenges and requiring new solutions
The field has quickly moved from simple completions to agents capable of substantial, autonomous engineering contributions, with ongoing rapid growth expected—potentially another 16–64x increase in capability in the coming year