Skip to content
AI

Stop Over-Instructing AI: The 2025 Surton Outcome-Based Prompting Framework

Why detailed step-by-step prompts reduce AI performance and how to use outcome-based prompting for better results. Includes the SIT framework and Surton's prompt optimization methodology.

After prompting AI systems thousands of times at Surton, we’ve learned that detailed step-by-step instructions consistently produce worse results than outcome-based delegation. When you script every step, you prevent AI from using its breadth of knowledge and pattern recognition. The most effective prompts define the destination and success criteria—then let AI determine the optimal path.

This guide is our outcome-based prompting framework. It includes the SIT (Situation-Intent-Test) methodology, comparison of procedural vs. outcome-based approaches, and Surton’s prompt optimization process.

Quick Take

Step-by-step procedural prompts underperform because they lock AI into your current understanding, preventing it from leveraging its broader pattern knowledge. Shift to outcome-based prompting: define SITUATION (context), INTENT (desired outcome), TEST (success criteria). Procedural: “Step 1: X, Step 2: Y” = micromanagement. Outcome-based: “Make checkout accept PayPal. Success: both payment methods work, existing flow unchanged, tests pass” = delegation with clarity. AI performs 30-50% better with outcome-based prompts. Use procedural only when specific sequence is safety-critical. Default to outcome-based for problem-solving, coding, analysis, and creative work. Trust but verify: define tests upfront, review AI’s proposed approach, automate verification.

The Procedural Prompt Trap

The Common Approach:

Procedural Prompt:
"Step 1: Open the user authentication file
Step 2: Find the login function
Step 3: Add password validation check
Step 4: Update error handling
Step 5: Write unit test for new validation
Step 6: Run tests to verify"

Why This Underperforms:

  • Assumes current file structure (may be wrong)
  • Prescribes specific implementation (may not be optimal)
  • Prevents AI from considering better approaches
  • Creates brittleness (if step 2 fails, entire prompt fails)
  • Wastes AI capability (treating it like dumb automation)

The Data: In Surton A/B tests:

  • Procedural prompts: 65% success rate, average solution quality 6/10
  • Outcome-based prompts: 87% success rate, average solution quality 8.5/10

The Pattern: Over-instruction correlates with worse outcomes.

The Outcome-Based Alternative

The SIT Framework:

S - Situation: Context, background, constraints
I - Intent: What you want to achieve
T - Test: How you’ll verify success

Example:

SITUATION:
- Python web application, Flask framework
- Current: Username/password auth only
- User request: Add Google OAuth option
- Constraints: Must maintain existing auth, minimal disruption

INTENT:
- Enable Google OAuth as alternative login method
- Users can choose username/password OR Google
- Existing user accounts remain functional
- Implementation should be maintainable

TEST (Success Criteria):
- [ ] New users can register with Google OAuth
- [ ] Existing users can link Google to current account
- [ ] Login page shows both options clearly
- [ ] Username/password auth still works 100%
- [ ] All existing tests pass
- [ ] New OAuth flow has unit tests
- [ ] Security review approved
- [ ] Documentation updated

Result: AI proposes optimal implementation, you review plan, AI executes, tests verify.

Real Example: Refactoring Legacy Code

Procedural Approach (Underperforms):

"Step 1: Find all database queries in file app.py
Step 2: Extract each query into a separate function
Step 3: Move functions to db.py
Step 4: Update imports in app.py
Step 5: Run tests"

What Actually Happens:

  • File structure different than assumed
  • Some queries have complex dependencies
  • Tests fail for unclear reasons
  • 3 hours of iteration, partial success

Outcome-Based Approach (Succeeds):

SITUATION:
- Legacy Flask app, 8 years old
- 5,000 lines in single app.py
- Database queries scattered throughout
- Goal: Separation of concerns for maintainability
- Constraint: Must not break existing functionality

INTENT:
- Extract database layer from app.py
- Create clean data access layer
- Maintain all existing functionality
- Improve testability

TEST:
- [ ] All 150 existing tests pass
- [ ] No change to external API behavior
- [ ] Database operations in dedicated module
- [ ] App.py reduced to route definitions + business logic
- [ ] New data layer has 80%+ test coverage
- [ ] Performance equal or better

What Happens:

  • AI analyzes codebase first
  • Proposes modular structure based on actual code
  • Identifies dependencies AI didn’t know about
  • Suggests migration path
  • You review, approve, AI executes
  • Tests verify all criteria met
  • Result: Clean refactoring, 2 hours total

When to Use Procedural vs. Outcome-Based

Use PROCEDURAL When:

  • ✅ Specific sequence is safety-critical (medical, financial compliance)
  • ✅ Teaching the method is the goal (training, education)
  • ✅ Legally regulated process (SOX, HIPAA, aviation)
  • ✅ You’ve tried outcome-based and got unacceptable results

Use OUTCOME-BASED When (Default):

  • 🎯 Problem-solving is the goal
  • 🎯 Multiple valid approaches exist
  • 🎯 AI has broader context than you (more patterns)
  • 🎯 Creativity/judgment adds value
  • 🎯 Speed/efficiency matters
  • 🎯 You’re not sure the best approach

Surton Rule: Default to outcome-based. Use procedural only with explicit justification.

The Trust and Verification Framework

How to delegate to AI without losing control:

Step 1: Define Tests Upfront

  • What does success look like?
  • How will we verify objectively?
  • What are edge cases to handle?

Step 2: AI Proposes Approach

  • “Before executing, outline your plan”
  • Review for major issues
  • Give feedback on approach
  • Approve or redirect

Step 3: Execute with Visibility

  • AI works with logging/updates
  • You can observe progress
  • Interrupt if going wrong

Step 4: Automated Verification

  • Run the tests you defined
  • Check all success criteria
  • Measure against baseline

Step 5: Human Review

  • Review results before production
  • Check for edge cases
  • Validate quality

Step 6: Iterate If Needed

  • If tests fail, feed back to AI
  • Refine approach
  • Re-run verification

The Principle: Trust AI with execution, not with defining success. You set the finish line and the verification; AI figures out how to get there.

The 30-Minute Prompt Optimization

Week 1: Baseline

  • Pick 3 common tasks you use AI for
  • Use your current prompting style
  • Measure: Time to result, quality (1-10), iterations needed

Week 2: SIT Conversion

  • Convert prompts to SIT format
  • Situation: Add context you were omitting
  • Intent: Clarify specific outcomes
  • Test: Define explicit success criteria

Week 3: Compare

  • Run same 3 tasks with new prompts
  • Measure: Time, quality, iterations
  • Calculate improvement

Typical Results:

  • Time: -30-50%
  • Quality: +20-40%
  • Iterations: -50-70%

Surton Data: Team-wide SIT adoption resulted in 40% productivity improvement in AI-assisted work.

Common Mistakes in Outcome-Based Prompting

Mistake 1: Vague Intent

Bad: “Make it better”
Better: “Reduce page load time to <2 seconds while maintaining all current functionality”

Mistake 2: Insufficient Situation

Bad: No context provided
Better: “React app, currently using Redux, team wants to simplify state management, 15 components affected”

Mistake 3: Missing Tests

Bad: No success criteria
Better: “All existing tests pass, new approach has unit tests, bundle size doesn’t increase >10%“

Mistake 4: Over-Specifying in Intent

Bad: Intent includes “use specific library X”
Better: Let AI recommend best approach, specify constraints not solutions

Mistake 5: Not Reviewing AI’s Plan

Bad: AI goes straight to execution
Better: “Outline your approach before implementing”

When Surton Can Help

If you:

  • Want to improve AI prompting across your team
  • Need to build outcome-based prompting standards
  • Want to measure AI productivity gains
  • Need training on SIT framework

Surton offers AI Productivity Consulting where we:

  1. Audit current prompting practices
  2. Train team on SIT framework
  3. Build prompt library for common tasks
  4. Measure productivity improvements
  5. Create team prompting standards

Typical engagement: 2-4 weeks, $15k-30k
ROI: 30-50% improvement in AI-assisted work productivity



This is Surton’s definitive 2025 outcome-based prompting guide. For the original newsletter version, see The Blueprint.

Frequently asked questions

Why do detailed step-by-step prompts perform worse?

Over-instruction locks AI into your current understanding, preventing it from using its broader knowledge. You're essentially saying 'do it my way' rather than 'solve this problem.' The result: AI follows directions literally without bringing creativity, pattern matching, or alternative approaches. You get compliance, not leverage. Better: Define the destination (what success looks like) and success criteria (how you'll verify), not the route (steps to take).

What's the difference between procedural and outcome-based prompts?

Procedural: 'Step 1: Open file X. Step 2: Change function Y. Step 3: Add test Z.' Outcome-based: 'Make the checkout process accept both credit cards and PayPal. Success: both payment methods work in test, existing credit card flow unchanged, code reviewed and approved.' Procedural = micromanagement. Outcome-based = delegation with clarity. AI performs 30-50% better with outcome-based prompts because it can use judgment and select optimal path.

When should I use procedural vs. outcome-based prompts?

Use PROCEDURAL when: specific sequence is critical for safety/compliance, teaching someone the method matters more than result, process is legally regulated (SOX, HIPAA). Use OUTCOME-BASED when: problem-solving is the goal, multiple valid approaches exist, AI has broader context than you (has seen more patterns), creativity or judgment is valuable, speed/efficiency matters. Default to outcome-based; use procedural only when constraints truly require it.

How do I write effective outcome-based prompts?

Use the SIT framework: SITUATION (context, background, constraints), INTENT (what you want to achieve, specific outcome), TEST (how you'll verify success, acceptance criteria). Example: 'SITUATION: Legacy Python 2 codebase, migrating to Python 3, 50k lines, critical for business. INTENT: Upgrade to Python 3.9 with zero breaking changes, maintain all existing functionality, improve performance if possible. TEST: All 200 unit tests pass, integration tests pass, performance equal or better, security scan clean.' Let AI figure out the migration path.

What makes good success criteria for AI?

Testable, specific, multi-dimensional. Bad: 'Make it better.' Good: 'Function completes in <100ms (performance), handles 10k concurrent users (scale), logs errors with stack trace (observability), passes security scan with zero critical findings (security), code review approved by senior engineer (quality).' The more specific your tests, the better AI can optimize. Think: 'How would I verify this is done right?' not 'How would I do this?'

How do I trust AI with outcome-based prompts?

Build verification into the workflow: (1) Define success criteria upfront (clear tests), (2) AI proposes approach before executing (review plan), (3) Execute with logging/visibility, (4) Automated testing verifies success criteria, (5) Human reviews results before production, (6) Iterate if criteria not met. Trust but verify. You're delegating execution, not abdicating responsibility. The tests you defined are your safety net.