The 3-Tool Rotation for AI Engineering: The 2025 Surton Coding Agent Operating Model
A practical AI engineering stack: one fast execution agent, one diagnostic reasoning model, and one multimodal comprehension tool. Includes switching rules, failure modes, and Surton's team workflow templates.
Most teams ask the wrong question about AI engineering tools: which one should we use?
The better question is: which tool should play which role?
At Surton, we treat AI engineering as a rotation. Models improve, regress, hit rate limits, or get stuck in different ways. A setup that felt unbeatable last week can become the bottleneck this week. The goal is not tool loyalty. The goal is knowing when to execute, when to diagnose, and when to step back and understand the system.
This guide documents the three-role operating model we use across AI-assisted engineering work.
Quick Take
Use a three-tool rotation for AI engineering: (1) a fast execution agent for implementation, refactors, tests, and routine changes; (2) a stronger diagnostic model for debugging, architecture review, security, and cases where the first tool loops; (3) a multimodal comprehension tool for screenshots, diagrams, UI states, dashboards, and unfamiliar systems. Switch after two failed loops or when the agent produces output without progress. Standardize roles and handoffs, not one universal tool.
Role 1: The execution agent
The execution tool is your default worker.
It should:
- read the codebase
- plan multi-step tasks
- edit files
- run tests
- inspect errors
- iterate without constant supervision
Use it for:
- small feature implementation
- repetitive refactors
- test generation
- dependency updates
- documentation updates
- low-to-medium risk bug fixes
The execution tool does not have to be the deepest thinker. It has to keep momentum.
Good execution prompt
Situation:
This is a TypeScript application with existing tests. We need to add a new export flow for customer reports.
Intent:
Implement CSV export for the existing reports page using current project patterns. Keep UI changes minimal.
Test:
- Existing tests pass
- New export behavior has tests
- CSV includes columns A, B, C
- Empty report state handled
- No regressions to existing report filters
Before coding, summarize your plan and files you expect to change.
That prompt gives enough structure without telling the agent exactly how to move.
Role 2: The diagnostic model
The diagnostic model is not your default because it may be slower or more expensive. Use it when judgment matters more than speed.
Switch to diagnostic mode when:
- the execution agent tries the same fix twice
- tests keep failing for unclear reasons
- the agent starts changing unrelated files
- architectural assumptions seem wrong
- the problem is security/performance-sensitive
- the model produces volume instead of progress
Diagnostic handoff prompt
We are stuck on this task:
[describe goal]
What has been tried:
1.
2.
3.
Current failure:
[paste error/test output]
Relevant context:
[paste files/architecture notes]
Do not write code yet. Diagnose the likely root cause, identify wrong assumptions, and recommend the simplest next move.
This reset matters. You are changing modes from doing to thinking.
Role 3: The multimodal comprehension tool
Some problems are not best understood from code alone.
Use a multimodal tool for:
- UI screenshots
- broken layouts
- user flows
- architecture diagrams
- dashboards and charts
- tracing visual state
- explaining unfamiliar systems to non-specialists
Example prompt:
Here is a screenshot of the current checkout flow and a screenshot of the expected design.
Identify:
1. visual differences
2. likely implementation sources
3. which files/components probably need inspection
4. a minimal fix plan
Do not write code yet. Help me understand the mismatch.
This often saves hours of guessing.
The switching rules
A rotation only works if switching is explicit.
| Signal | Action |
|---|---|
| Execution agent succeeds first pass | Continue |
| One failed attempt | Let it retry with error context |
| Two failed attempts | Switch to diagnostic model |
| UI/visual mismatch | Switch to multimodal comprehension |
| Architecture uncertainty | Diagnostic review before coding |
| Security/performance-sensitive change | Diagnostic review before merge |
| Large unfamiliar codebase | Multimodal/diagram mapping before execution |
The rule that matters most: after two failed loops, stop letting the same agent dig deeper.
The handoff packet
When switching tools, provide a consistent packet:
Goal:
Current state:
What has been tried:
Evidence:
Relevant files:
Constraints:
Definition of done:
Question for this tool:
Without a handoff packet, switching tools becomes context loss instead of leverage.
Team standardization without tool dogma
Do not force every engineer to use the same AI tool for every task. Force clarity about roles.
Team standard:
- Execution agent can modify code
- Diagnostic model reviews stuck work and high-risk changes
- Multimodal tool explains visual/architecture context
- Human engineer owns verification and final judgment
That last line is important. Tools can assist. Engineers remain accountable.
Metrics to track
Track whether the rotation is improving work:
| Metric | Target |
|---|---|
| AI-assisted task cycle time | Down 25-50% |
| Rework after AI-generated changes | Down over time |
| Failed agent loops before switch | ≤2 |
| Human review findings | Stable or improving |
| Engineer satisfaction | Up |
If speed improves but quality drops, your rotation is under-reviewed. If quality improves but speed does not, the switching rules may be too conservative.
When Surton can help
Surton helps engineering teams build practical AI workflows around real delivery, not tool hype.
We can help with:
- AI engineering workflow design
- coding agent standards
- review and verification processes
- team training
- tool evaluation and switching rules
See Surton’s AI implementation services if your team is experimenting with coding agents but lacks a repeatable operating model.
Related resources
- Stop Over-Instructing AI — better prompts for agents
- The Non-Technical Leader’s Guide to Claude Code — workflows outside engineering
- My 3-Tool Rotation for AI Engineering (Original) — The Blueprint edition
This is Surton’s definitive 2025 AI engineering tool rotation. For the original newsletter version, see The Blueprint.
Frequently asked questions
What is the best AI tool stack for engineering?
Do not think of it as one best tool. Use a rotation: one fast execution agent for implementation, one stronger reasoning model for diagnosis and architecture review, and one multimodal tool for screenshots, diagrams, and unfamiliar systems. The best stack is role-based, not loyalty-based.
When should I switch AI tools during coding work?
Switch when the current tool repeats failed approaches, produces lots of output without progress, misunderstands architecture, ignores test results, or starts patching symptoms instead of diagnosing root cause. A good rule: after two failed loops, stop execution and switch to diagnostic mode.
What should the execution tool do?
The execution tool should plan tasks, edit files, run tests, read errors, and maintain momentum across implementation work. It is best for feature work, refactors, repetitive code changes, test generation, and small-to-medium autonomous loops. It should be fast and reliable, not necessarily the smartest model available.
What should the diagnostic tool do?
The diagnostic tool should analyze failures, review architecture, challenge assumptions, and explain why the execution agent got stuck. Use it for debugging, design review, security questions, performance issues, and any situation where output volume is increasing but progress is not.
Why do I need a multimodal tool?
Engineering problems are not always text-only. Screenshots, diagrams, UI states, dashboards, traces, and architecture maps often contain the missing context. Multimodal tools help convert visual complexity into a working mental model, especially in unfamiliar codebases or product flows.
How do teams standardize AI engineering workflows?
Define tool roles, switching rules, review requirements, and handoff templates. Example: execution tool owns implementation, diagnostic tool reviews after two failed attempts, multimodal tool handles UI/diagram comprehension. Standardization should guide judgment, not force every engineer into the same tool for every task.
Keep reading
More field notes on applying AI, leading teams, and building durable companies.
What 2025 Revealed About AI and the Future of Work
AI did more than speed up work in 2025. It challenged old ideas about identity, value, and what staying relevant now requires.
SOPs Aren’t Enough Anymore
Static process docs help teams scale, but AI makes something more powerful possible: a living context layer that keeps work moving when key people step away.
AI Doesn’t Modernize a Codebase. Systems Do.
Legacy software doesn’t become AI-enabled through ad hoc tool use. It changes when teams redesign how work enters, moves through, and improves the engineering system.