Skip to content
AI

The 3-Tool Rotation for AI Engineering: The 2025 Surton Coding Agent Operating Model

A practical AI engineering stack: one fast execution agent, one diagnostic reasoning model, and one multimodal comprehension tool. Includes switching rules, failure modes, and Surton's team workflow templates.

Most teams ask the wrong question about AI engineering tools: which one should we use?

The better question is: which tool should play which role?

At Surton, we treat AI engineering as a rotation. Models improve, regress, hit rate limits, or get stuck in different ways. A setup that felt unbeatable last week can become the bottleneck this week. The goal is not tool loyalty. The goal is knowing when to execute, when to diagnose, and when to step back and understand the system.

This guide documents the three-role operating model we use across AI-assisted engineering work.

Quick Take

Use a three-tool rotation for AI engineering: (1) a fast execution agent for implementation, refactors, tests, and routine changes; (2) a stronger diagnostic model for debugging, architecture review, security, and cases where the first tool loops; (3) a multimodal comprehension tool for screenshots, diagrams, UI states, dashboards, and unfamiliar systems. Switch after two failed loops or when the agent produces output without progress. Standardize roles and handoffs, not one universal tool.

Role 1: The execution agent

The execution tool is your default worker.

It should:

  • read the codebase
  • plan multi-step tasks
  • edit files
  • run tests
  • inspect errors
  • iterate without constant supervision

Use it for:

  • small feature implementation
  • repetitive refactors
  • test generation
  • dependency updates
  • documentation updates
  • low-to-medium risk bug fixes

The execution tool does not have to be the deepest thinker. It has to keep momentum.

Good execution prompt

Situation:
This is a TypeScript application with existing tests. We need to add a new export flow for customer reports.

Intent:
Implement CSV export for the existing reports page using current project patterns. Keep UI changes minimal.

Test:

- Existing tests pass
- New export behavior has tests
- CSV includes columns A, B, C
- Empty report state handled
- No regressions to existing report filters

Before coding, summarize your plan and files you expect to change.

That prompt gives enough structure without telling the agent exactly how to move.

Role 2: The diagnostic model

The diagnostic model is not your default because it may be slower or more expensive. Use it when judgment matters more than speed.

Switch to diagnostic mode when:

  • the execution agent tries the same fix twice
  • tests keep failing for unclear reasons
  • the agent starts changing unrelated files
  • architectural assumptions seem wrong
  • the problem is security/performance-sensitive
  • the model produces volume instead of progress

Diagnostic handoff prompt

We are stuck on this task:
[describe goal]

What has been tried:

1.
2.
3.

Current failure:
[paste error/test output]

Relevant context:
[paste files/architecture notes]

Do not write code yet. Diagnose the likely root cause, identify wrong assumptions, and recommend the simplest next move.

This reset matters. You are changing modes from doing to thinking.

Role 3: The multimodal comprehension tool

Some problems are not best understood from code alone.

Use a multimodal tool for:

  • UI screenshots
  • broken layouts
  • user flows
  • architecture diagrams
  • dashboards and charts
  • tracing visual state
  • explaining unfamiliar systems to non-specialists

Example prompt:

Here is a screenshot of the current checkout flow and a screenshot of the expected design.

Identify:

1. visual differences
2. likely implementation sources
3. which files/components probably need inspection
4. a minimal fix plan

Do not write code yet. Help me understand the mismatch.

This often saves hours of guessing.

The switching rules

A rotation only works if switching is explicit.

SignalAction
Execution agent succeeds first passContinue
One failed attemptLet it retry with error context
Two failed attemptsSwitch to diagnostic model
UI/visual mismatchSwitch to multimodal comprehension
Architecture uncertaintyDiagnostic review before coding
Security/performance-sensitive changeDiagnostic review before merge
Large unfamiliar codebaseMultimodal/diagram mapping before execution

The rule that matters most: after two failed loops, stop letting the same agent dig deeper.

The handoff packet

When switching tools, provide a consistent packet:

Goal:
Current state:
What has been tried:
Evidence:
Relevant files:
Constraints:
Definition of done:
Question for this tool:

Without a handoff packet, switching tools becomes context loss instead of leverage.

Team standardization without tool dogma

Do not force every engineer to use the same AI tool for every task. Force clarity about roles.

Team standard:

  • Execution agent can modify code
  • Diagnostic model reviews stuck work and high-risk changes
  • Multimodal tool explains visual/architecture context
  • Human engineer owns verification and final judgment

That last line is important. Tools can assist. Engineers remain accountable.

Metrics to track

Track whether the rotation is improving work:

MetricTarget
AI-assisted task cycle timeDown 25-50%
Rework after AI-generated changesDown over time
Failed agent loops before switch≤2
Human review findingsStable or improving
Engineer satisfactionUp

If speed improves but quality drops, your rotation is under-reviewed. If quality improves but speed does not, the switching rules may be too conservative.

When Surton can help

Surton helps engineering teams build practical AI workflows around real delivery, not tool hype.

We can help with:

  • AI engineering workflow design
  • coding agent standards
  • review and verification processes
  • team training
  • tool evaluation and switching rules

See Surton’s AI implementation services if your team is experimenting with coding agents but lacks a repeatable operating model.



This is Surton’s definitive 2025 AI engineering tool rotation. For the original newsletter version, see The Blueprint.

Frequently asked questions

What is the best AI tool stack for engineering?

Do not think of it as one best tool. Use a rotation: one fast execution agent for implementation, one stronger reasoning model for diagnosis and architecture review, and one multimodal tool for screenshots, diagrams, and unfamiliar systems. The best stack is role-based, not loyalty-based.

When should I switch AI tools during coding work?

Switch when the current tool repeats failed approaches, produces lots of output without progress, misunderstands architecture, ignores test results, or starts patching symptoms instead of diagnosing root cause. A good rule: after two failed loops, stop execution and switch to diagnostic mode.

What should the execution tool do?

The execution tool should plan tasks, edit files, run tests, read errors, and maintain momentum across implementation work. It is best for feature work, refactors, repetitive code changes, test generation, and small-to-medium autonomous loops. It should be fast and reliable, not necessarily the smartest model available.

What should the diagnostic tool do?

The diagnostic tool should analyze failures, review architecture, challenge assumptions, and explain why the execution agent got stuck. Use it for debugging, design review, security questions, performance issues, and any situation where output volume is increasing but progress is not.

Why do I need a multimodal tool?

Engineering problems are not always text-only. Screenshots, diagrams, UI states, dashboards, traces, and architecture maps often contain the missing context. Multimodal tools help convert visual complexity into a working mental model, especially in unfamiliar codebases or product flows.

How do teams standardize AI engineering workflows?

Define tool roles, switching rules, review requirements, and handoff templates. Example: execution tool owns implementation, diagnostic tool reviews after two failed attempts, multimodal tool handles UI/diagram comprehension. Standardization should guide judgment, not force every engineer into the same tool for every task.