Stop Over-Instructing AI: The 2025 Surton Outcome-Based Prompting Framework
Why detailed step-by-step prompts reduce AI performance and how to use outcome-based prompting for better results. Includes the SIT framework and Surton's prompt optimization methodology.
After prompting AI systems thousands of times at Surton, we’ve learned that detailed step-by-step instructions consistently produce worse results than outcome-based delegation. When you script every step, you prevent AI from using its breadth of knowledge and pattern recognition. The most effective prompts define the destination and success criteria—then let AI determine the optimal path.
This guide is our outcome-based prompting framework. It includes the SIT (Situation-Intent-Test) methodology, comparison of procedural vs. outcome-based approaches, and Surton’s prompt optimization process.
Quick Take
Step-by-step procedural prompts underperform because they lock AI into your current understanding, preventing it from leveraging its broader pattern knowledge. Shift to outcome-based prompting: define SITUATION (context), INTENT (desired outcome), TEST (success criteria). Procedural: “Step 1: X, Step 2: Y” = micromanagement. Outcome-based: “Make checkout accept PayPal. Success: both payment methods work, existing flow unchanged, tests pass” = delegation with clarity. AI performs 30-50% better with outcome-based prompts. Use procedural only when specific sequence is safety-critical. Default to outcome-based for problem-solving, coding, analysis, and creative work. Trust but verify: define tests upfront, review AI’s proposed approach, automate verification.
The Procedural Prompt Trap
The Common Approach:
Procedural Prompt:
"Step 1: Open the user authentication file
Step 2: Find the login function
Step 3: Add password validation check
Step 4: Update error handling
Step 5: Write unit test for new validation
Step 6: Run tests to verify"
Why This Underperforms:
- Assumes current file structure (may be wrong)
- Prescribes specific implementation (may not be optimal)
- Prevents AI from considering better approaches
- Creates brittleness (if step 2 fails, entire prompt fails)
- Wastes AI capability (treating it like dumb automation)
The Data: In Surton A/B tests:
- Procedural prompts: 65% success rate, average solution quality 6/10
- Outcome-based prompts: 87% success rate, average solution quality 8.5/10
The Pattern: Over-instruction correlates with worse outcomes.
The Outcome-Based Alternative
The SIT Framework:
S - Situation: Context, background, constraints
I - Intent: What you want to achieve
T - Test: How you’ll verify success
Example:
SITUATION:
- Python web application, Flask framework
- Current: Username/password auth only
- User request: Add Google OAuth option
- Constraints: Must maintain existing auth, minimal disruption
INTENT:
- Enable Google OAuth as alternative login method
- Users can choose username/password OR Google
- Existing user accounts remain functional
- Implementation should be maintainable
TEST (Success Criteria):
- [ ] New users can register with Google OAuth
- [ ] Existing users can link Google to current account
- [ ] Login page shows both options clearly
- [ ] Username/password auth still works 100%
- [ ] All existing tests pass
- [ ] New OAuth flow has unit tests
- [ ] Security review approved
- [ ] Documentation updated
Result: AI proposes optimal implementation, you review plan, AI executes, tests verify.
Real Example: Refactoring Legacy Code
Procedural Approach (Underperforms):
"Step 1: Find all database queries in file app.py
Step 2: Extract each query into a separate function
Step 3: Move functions to db.py
Step 4: Update imports in app.py
Step 5: Run tests"
What Actually Happens:
- File structure different than assumed
- Some queries have complex dependencies
- Tests fail for unclear reasons
- 3 hours of iteration, partial success
Outcome-Based Approach (Succeeds):
SITUATION:
- Legacy Flask app, 8 years old
- 5,000 lines in single app.py
- Database queries scattered throughout
- Goal: Separation of concerns for maintainability
- Constraint: Must not break existing functionality
INTENT:
- Extract database layer from app.py
- Create clean data access layer
- Maintain all existing functionality
- Improve testability
TEST:
- [ ] All 150 existing tests pass
- [ ] No change to external API behavior
- [ ] Database operations in dedicated module
- [ ] App.py reduced to route definitions + business logic
- [ ] New data layer has 80%+ test coverage
- [ ] Performance equal or better
What Happens:
- AI analyzes codebase first
- Proposes modular structure based on actual code
- Identifies dependencies AI didn’t know about
- Suggests migration path
- You review, approve, AI executes
- Tests verify all criteria met
- Result: Clean refactoring, 2 hours total
When to Use Procedural vs. Outcome-Based
Use PROCEDURAL When:
- ✅ Specific sequence is safety-critical (medical, financial compliance)
- ✅ Teaching the method is the goal (training, education)
- ✅ Legally regulated process (SOX, HIPAA, aviation)
- ✅ You’ve tried outcome-based and got unacceptable results
Use OUTCOME-BASED When (Default):
- 🎯 Problem-solving is the goal
- 🎯 Multiple valid approaches exist
- 🎯 AI has broader context than you (more patterns)
- 🎯 Creativity/judgment adds value
- 🎯 Speed/efficiency matters
- 🎯 You’re not sure the best approach
Surton Rule: Default to outcome-based. Use procedural only with explicit justification.
The Trust and Verification Framework
How to delegate to AI without losing control:
Step 1: Define Tests Upfront
- What does success look like?
- How will we verify objectively?
- What are edge cases to handle?
Step 2: AI Proposes Approach
- “Before executing, outline your plan”
- Review for major issues
- Give feedback on approach
- Approve or redirect
Step 3: Execute with Visibility
- AI works with logging/updates
- You can observe progress
- Interrupt if going wrong
Step 4: Automated Verification
- Run the tests you defined
- Check all success criteria
- Measure against baseline
Step 5: Human Review
- Review results before production
- Check for edge cases
- Validate quality
Step 6: Iterate If Needed
- If tests fail, feed back to AI
- Refine approach
- Re-run verification
The Principle: Trust AI with execution, not with defining success. You set the finish line and the verification; AI figures out how to get there.
The 30-Minute Prompt Optimization
Week 1: Baseline
- Pick 3 common tasks you use AI for
- Use your current prompting style
- Measure: Time to result, quality (1-10), iterations needed
Week 2: SIT Conversion
- Convert prompts to SIT format
- Situation: Add context you were omitting
- Intent: Clarify specific outcomes
- Test: Define explicit success criteria
Week 3: Compare
- Run same 3 tasks with new prompts
- Measure: Time, quality, iterations
- Calculate improvement
Typical Results:
- Time: -30-50%
- Quality: +20-40%
- Iterations: -50-70%
Surton Data: Team-wide SIT adoption resulted in 40% productivity improvement in AI-assisted work.
Common Mistakes in Outcome-Based Prompting
Mistake 1: Vague Intent
Bad: “Make it better”
Better: “Reduce page load time to <2 seconds while maintaining all current functionality”
Mistake 2: Insufficient Situation
Bad: No context provided
Better: “React app, currently using Redux, team wants to simplify state management, 15 components affected”
Mistake 3: Missing Tests
Bad: No success criteria
Better: “All existing tests pass, new approach has unit tests, bundle size doesn’t increase >10%“
Mistake 4: Over-Specifying in Intent
Bad: Intent includes “use specific library X”
Better: Let AI recommend best approach, specify constraints not solutions
Mistake 5: Not Reviewing AI’s Plan
Bad: AI goes straight to execution
Better: “Outline your approach before implementing”
When Surton Can Help
If you:
- Want to improve AI prompting across your team
- Need to build outcome-based prompting standards
- Want to measure AI productivity gains
- Need training on SIT framework
Surton offers AI Productivity Consulting where we:
- Audit current prompting practices
- Train team on SIT framework
- Build prompt library for common tasks
- Measure productivity improvements
- Create team prompting standards
Typical engagement: 2-4 weeks, $15k-30k
ROI: 30-50% improvement in AI-assisted work productivity
Related Resources
- How I Actually Use AI — Daily AI workflow system
- AI Creates Value Where Predictability Breaks Down — When to use AI vs. rules
- Stop Over-Instructing AI (Original) — The Blueprint edition
This is Surton’s definitive 2025 outcome-based prompting guide. For the original newsletter version, see The Blueprint.
Frequently asked questions
Why do detailed step-by-step prompts perform worse?
Over-instruction locks AI into your current understanding, preventing it from using its broader knowledge. You're essentially saying 'do it my way' rather than 'solve this problem.' The result: AI follows directions literally without bringing creativity, pattern matching, or alternative approaches. You get compliance, not leverage. Better: Define the destination (what success looks like) and success criteria (how you'll verify), not the route (steps to take).
What's the difference between procedural and outcome-based prompts?
Procedural: 'Step 1: Open file X. Step 2: Change function Y. Step 3: Add test Z.' Outcome-based: 'Make the checkout process accept both credit cards and PayPal. Success: both payment methods work in test, existing credit card flow unchanged, code reviewed and approved.' Procedural = micromanagement. Outcome-based = delegation with clarity. AI performs 30-50% better with outcome-based prompts because it can use judgment and select optimal path.
When should I use procedural vs. outcome-based prompts?
Use PROCEDURAL when: specific sequence is critical for safety/compliance, teaching someone the method matters more than result, process is legally regulated (SOX, HIPAA). Use OUTCOME-BASED when: problem-solving is the goal, multiple valid approaches exist, AI has broader context than you (has seen more patterns), creativity or judgment is valuable, speed/efficiency matters. Default to outcome-based; use procedural only when constraints truly require it.
How do I write effective outcome-based prompts?
Use the SIT framework: SITUATION (context, background, constraints), INTENT (what you want to achieve, specific outcome), TEST (how you'll verify success, acceptance criteria). Example: 'SITUATION: Legacy Python 2 codebase, migrating to Python 3, 50k lines, critical for business. INTENT: Upgrade to Python 3.9 with zero breaking changes, maintain all existing functionality, improve performance if possible. TEST: All 200 unit tests pass, integration tests pass, performance equal or better, security scan clean.' Let AI figure out the migration path.
What makes good success criteria for AI?
Testable, specific, multi-dimensional. Bad: 'Make it better.' Good: 'Function completes in <100ms (performance), handles 10k concurrent users (scale), logs errors with stack trace (observability), passes security scan with zero critical findings (security), code review approved by senior engineer (quality).' The more specific your tests, the better AI can optimize. Think: 'How would I verify this is done right?' not 'How would I do this?'
How do I trust AI with outcome-based prompts?
Build verification into the workflow: (1) Define success criteria upfront (clear tests), (2) AI proposes approach before executing (review plan), (3) Execute with logging/visibility, (4) Automated testing verifies success criteria, (5) Human reviews results before production, (6) Iterate if criteria not met. Trust but verify. You're delegating execution, not abdicating responsibility. The tests you defined are your safety net.
Keep reading
More field notes on applying AI, leading teams, and building durable companies.
SOPs Aren’t Enough Anymore
Static process docs help teams scale, but AI makes something more powerful possible: a living context layer that keeps work moving when key people step away.
You Built It With AI. Now You Have to Support It.
AI can collapse the path from idea to prototype, but it does not eliminate the cost of performance, security, maintenance, or support.
Why Q1 Became a Turning Point for Surton
Client demand finally caught up with Surton's early AI shift, changing the company's work, conversations, and direction in a single quarter.