Traditional models relied on complex logic loops and separate prompts for task determination and execution.
Newer approaches allow models to independently decide when and how to use tools and write/run code.
From a security perspective, these developments introduce concerns similar to remote code exploitation (RCE).
Common model failure vectors: prompt injection, data exfiltration, accidental installation of malicious/vulnerable packages, privilege escalation, and sandbox escapes.
Human review is still essential as language models can generate large volumes of code that require oversight.
Automated code review tools and LLM-based reviews can help but do not replace manual human judgment.
Monitoring tools (e.g., operator with domain lists and action monitors) can help identify and flag sensitive or risky operations for human intervention.
The challenge remains to balance security (manual review, monitoring) and usability (automation, flexibility).
New tools like Local Shell APIs and apply patch formats assist agents in performing tasks while improving robustness (e.g., handling git diffs more reliably).
External services, like MCP for dependency vulnerability checks, can be integrated for additional safety.
Strongly recommend using remote containers for agent execution; OpenAI plans to offer container services as part of their Agents SDK and API.
Flexibility is offered between local and OpenAI-hosted environments.