Agentic Engineering
Building AI Teams That Write Senior-Level Code
The Quality Gap
Most AI-generated code works. That’s not the problem.
The problem is it works like junior developer code: functional but needing heavy refactoring. Missing edge cases. Unclear naming. Tight coupling. No tests worth trusting.
You trade coding time for review time. The leverage disappears.
Picture the workflow: Your agent writes a feature. You review it. The logic works, but the implementation is... rough. You spend an hour refactoring what took the agent three minutes to generate.
Net result? You didn’t eliminate work. You shifted it from writing to rewriting.
After building numerous agentic systems, I found the pattern: AI writes at junior level because we architect at junior level.
What If Your Agents Wrote Senior-Level Code?
Not perfect code—what code is perfect? But senior-level code:
The kind that passes code review with minor changes
That follows established patterns without being told
That handles edge cases proactively
That ships to production with confidence
Here’s what I discovered: most agents fail at code quality not because of the model, but because of the architecture.
We ask one agent to do what actually requires a specialized team:
Architectural planning
Implementation
Validation against specs
Code quality review
Requirements verification
When you compress all that into a single prompt, you get junior-level output. When you separate it into specialized agents with clear knowledge boundaries, you get senior-level output.
I call this Agentic Engineering - a structured approach to building AI agent teams that produce code senior engineers would write.
The Leverage Problem
Here’s the workflow most teams experience:
With typical agents:
Agent generates code (5 minutes)
Senior engineer reviews (15 minutes)
Senior engineer refactors (45 minutes)
Agent regenerates (5 minutes)
Senior engineer reviews again (15 minutes)
Total: 85 minutes, 60 minutes of senior time
The promise was: AI does the work, humans oversee
The reality is: AI does rough draft, humans do real work
You’re not getting leverage. You’re getting a very fast junior developer who requires constant oversight.
With Agentic Engineering:
Agent team produces code (15 minutes, automated)
Senior engineer reviews (15 minutes)
Minor changes if needed (5 minutes)
Total: 35 minutes, 20 minutes of senior time
The difference? The AI team already did the architectural thinking, validation, and quality review before you saw it.
The Four-Layer Architecture
Over the last year of building agentic systems, a pattern emerged. The systems that produced senior-level code shared the same architecture:
┌─────────────────────────────────────────────────────┐
│ Layer 1: ORCHESTRATION (Commands) │
│ Coordinates multi-phase workflows │
│ Example: /orchestrate, /auto, /status │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Layer 2: SPECIALIZATION (Agents) │
│ Domain experts with specific responsibilities │
│ Example: architect, engineer, reviewer, validator │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Layer 3: KNOWLEDGE (Skills) │
│ Sacred Rules (MUST follow) │
│ Sacred Taste (SHOULD follow) │
│ Example: backend-skill, frontend-skill, git-skill │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Layer 4: LEARNING (Metadata) │
│ Execution metrics, quality scores, insights │
│ Enables continuous improvement │
└─────────────────────────────────────────────────────┘These four layers solve the quality problem by separating concerns the way senior engineering teams naturally do.
Layer 1: Orchestration - The Engineering Manager
A senior team doesn’t have one person doing everything. Neither should your AI team.
The orchestrator coordinates specialists:
Launches the architect to design the solution
Passes the plan to the engineer to implement
Sends implementation to validator to check spec compliance
Routes code to reviewer to assess quality
Coordinates revision loops when needed
What it does NOT do:
Write code (that’s the engineer’s job)
Define quality standards (that’s skills + agents)
Make architectural decisions (that’s the architect’s authority)
The orchestrator is your engineering manager: coordinating work, not doing it.
Layer 2: Specialization - Senior Engineers, Not Generalists
Here’s why most agents produce junior code: they’re trying to wear too many hats.
Planning, implementation, validation, review, all in one context window. The result? Surface-level thinking on everything, depth on nothing.
The fix: Specialized agents with isolated contexts.
Each agent:
Runs in a fresh context window
Receives only what it needs via file paths
Has deep expertise in ONE domain
Produces artifacts, not context
Example team for Rails development:
Architect (Opus)
↓ writes architecture plan
Engineer (Sonnet)
↓ writes implementation, from the architect’s plan, following TDD
Feature Validator (Sonnet)
↓ writes compliance report - ensures the spec was implemented
Code Reviewer (Sonnet)
↓ writes quality assessment of the code writtenWhy this produces better code:
The architect thinks ONLY about architecture:
Data models
API design
Frontend patterns
Integration points
It’s not distracted by implementation. It goes deep.
The engineer thinks ONLY about implementation:
Follows the architecture plan
Writes tests first (TDD) (happy AND unhappy paths)
Applies established patterns
Documents deviations
It’s not distracted by design decisions. It focuses on clean execution.
The reviewer thinks ONLY about code quality:
Checks adherence to patterns
Identifies potential issues
Suggests improvements
Verifies best practices
Each agent has room to think deeply about its domain. The result? Senior-level output in each area.
Layer 3: Skills - Institutional Knowledge
Here’s what makes senior developers senior: they know the patterns, standards, and conventions that junior developers don’t.
Most agents are junior because they don’t have access to this knowledge - a senior engineer’s hard won experience. We cram rules into prompts, but they get lost in thousands of tokens.
The fix: Progressive disclosure through skills.
A skill is institutional knowledge, organized for just-in-time loading:
skills/rails-backend-skill/
SKILL.md # Navigation (~80 lines)
references/
BR-01-use-activerecord.md # Sacred Rule
BR-02-avoid-n-plus-1.md # Sacred Rule
BR-03-test-first.md # Sacred Rule
BT-01-method-length.md # Sacred Taste
BT-02-naming-conventions.md # Sacred Taste
restful-patterns.md # Pattern librarySacred Rules = MUST follow (blocking violations)
“Use ActiveRecord for database access, not raw SQL”
“Prevent N+1 queries with includes/preload”
“Write tests before implementation (TDD)”
Sacred Taste = SHOULD follow (suggestions)
“Keep methods under 15 lines”
“Use descriptive variable names”
“Extract complex logic to POROs”
The difference:
Junior code violates Sacred Rules (N+1 queries, missing tests, SQL injection risks).
Senior code follows Sacred Rules and generally follows Sacred Taste.
By loading skills progressively:
Agent sees navigation (knows what’s available)
Loads Sacred Rules before implementation
Loads Sacred Taste during refinement
Loads specific patterns as needed
The agent has access to senior-level knowledge without drowning in it.
Layer 4: Metadata - Continuous Improvement
Senior teams learn from every project. So should your AI teams.
After every orchestration, capture:
Quality Signals:
Code review findings
Sacred Rule violations (should be zero)
Sacred Taste suggestions (how many?)
Complexity assessment
Confidence scores from each agent
Execution Metrics:
Time per phase
Cost per phase
Tokens consumed
Revision cycles needed
Learning Signals:
Similar specs (pattern recognition)
Common issues (what keeps appearing?)
Skills referenced (which knowledge was critical?)
Plan-to-execution fidelity (how well did implementation match plan?)
Example from my systems: After 50 features, I noticed the architect was under-specifying error handling. Features worked but lacked graceful failure modes. I updated the architecture skill with error handling patterns. Code quality improved immediately.
The metadata told me what to fix.
Why This Produces Senior-Level Code
The four layers solve the quality problem:
Problem 1: No Architectural Thinking
Solution: Dedicated architect agent (Layer 2)
Thinks only about design
Not distracted by implementation
Deep expertise in patterns
Single Responsibility
Problem 2: Missing Institutional Knowledge
Solution: Skills with Sacred Rules (Layer 3)
Codified standards and patterns
Progressive disclosure
Enforced through validation
Problem 3: No Quality Review
Solution: Dedicated reviewer agent (Layer 2)
Fresh perspective on code
Checks against Sacred Rules and Taste
Suggests improvements before you see it
Problem 4: No Learning Loop
Solution: Rich metadata (Layer 4)
Identify recurring issues
Improve skills and agents
Continuous quality improvement
The result: Code that looks like a senior team wrote it.
Real-World Results: Rails Development
My reference implementation: visionaire-rails-team
Domain: Rails web application development
Goal: Transform feature specs into production code
Agents:
Architect (Opus) - Designs data models, APIs, frontend patterns
Engineer (Sonnet) - Implements features following TDD
Feature Validator (Sonnet) - Verifies architecture compliance
Code Reviewer (Sonnet) - Assesses quality against standards
Spec Validator (Sonnet) - Validates feature spec requirements met
Skills:
rails-backend-skill- ActiveRecord patterns, controller conventions, job handling, test standardsrails-frontend-skill- Turbo patterns, Stimulus controllers, view helpersgit-skill- Commit conventions, branch naming, workflow patternscode-review-skill- Review process, quality standards
Results after 20 features:
Code quality: Passes senior review with minor changes (typically 2-3 suggestions)
Sacred Rule violations: Avg 0.3 per feature (down from 4-5 with single-agent approach)
Refactoring required: Minimal (under 15 minutes per feature)
Time to production: 15 minutes from spec to merge-ready
Cost: $0.55 per feature
Bugs found in review: 3 across 20 features (normal iteration, not architectural flaws)
What changed:
Before this framework, agents wrote code that “worked” but required 45+ minutes of refactoring. Missing tests. N+1 queries. Tight coupling.
With this framework, agents write code that follows established patterns, includes comprehensive tests, and handles edge cases proactively.
The difference: senior-level architectural thinking from the start.
Beyond Software: Quality in Any Domain
The same architecture that produces senior-level code works for any domain requiring quality output.
Marketing Campaign Development:
Most AI marketing is generic. Sounds like AI wrote it. This framework produces campaigns that match your brand voice, follow proven patterns, and include copy senior marketers would approve.
Legal Document Review:
Most AI legal analysis is surface-level. Misses nuances. This framework produces analysis that identifies risks senior counsel would catch, with proper precedent citations and thorough clause analysis.
Content Production:
Most AI content is SEO-optimized fluff. No depth. This framework produces well-researched content with proper sourcing, fact-checking, and editorial quality that senior editors would approve.
Product Design:
Most AI design follows templates. Lacks sophistication. This framework produces designs that follow accessibility standards, design system conventions, and interaction patterns that senior UX designers would specify.
Same pattern: specialized agents + institutional knowledge = senior-level output
The Seven Principles
These principles distinguish senior-level output from junior-level:
1. Subagent Isolation (Single Responsibility)
Each agent has ONE job. Depth over breadth. No distractions. Just like senior engineers specialize, agents specialize.
2. File-Based Communication (Clear Contracts)
Agents communicate through artifacts, not context. The architect writes a plan. The engineer reads it. Clear interfaces, just like senior teams use documentation.
3. Revision Loops with Limits (Escalation to Humans)
Quality gates can trigger re-execution (max 2 iterations). Then escalate to humans. Bounded automation, not infinite retries. Senior teams know when to ask for help.
4. Deterministic Context (Convention Over Configuration)
All context derived from inputs. From filename S-001-feature-name.md derive spec ID, branch name, artifact directory. No magic. Just like senior teams use conventions.
5. Orchestrator Enforces Structure, Not Content
Orchestrator coordinates. Agents decide. The orchestrator ensures validation occurs. The validator decides what constitutes quality. Separation of concerns.
6. Progressive Disclosure (Load What You Need)
Load skills just-in-time. Navigation first, rules when needed, patterns as required. Minimizes context noise. Maximizes focus.
7. Metadata as Learning Signal
Track quality metrics. Learn from patterns. Improve continuously. Senior teams do retrospectives. So should your AI teams.
The Choice You’re Facing
Here’s what happens if you keep using single-agent approaches:
You’ll generate code that works in the moment. You’ll spend hours refactoring it to production quality. Your senior engineers become AI babysitters instead of architects.
The promise was leverage. The reality is shifted work.
Management asks: “Why are we investing in AI if it still requires the same senior time?”
The alternative is structure.
Not perfect agents. Not flawless code. Just better architecture that produces better output.
Senior-level code with normal bugs and normal iteration. But fundamentally different quality.
Getting Started
You don’t need to build the entire framework at once. Start with one feature. Elevate its quality. Then scale.
Day 1: Split Your Agent (2 hours)
Take your current code-generating agent. It probably does this:
Reads requirements
Designs solution
Writes code
(Maybe) validates
Split it:
agents/architect.md # Reads requirements → writes plan
agents/engineer.md # Reads plan → writes code
agents/reviewer.md # Reads code → writes reviewDay 2: Test the Pipeline (1 hour)
Run: architect → plan.md → engineer reads plan.md → code/ → reviewer reads code/ and plan.md
Compare the output to your single-agent version.
The code quality will be noticeably better. Why? The architect thought only about design. The engineer focused only on clean implementation. The reviewer provided fresh-eyes quality check.
Week 2: Add Sacred Rules (3-4 hours)
Create skills/[domain]-skill/SKILL.md:
## Sacred Rules (MUST follow)
- [RULE-01](refs/RULE-01.md): Use framework patterns, not raw SQL
- [RULE-02](refs/RULE-02.md): Write tests before implementation
- [RULE-03](refs/RULE-03.md): Handle errors explicitly
Load before implementation.Each reference file has:
The rule
Why it matters
Good vs bad examples
How to verify
Agents load these before writing code. Sacred Rule violations drop dramatically.
Week 3: Add Orchestration (4-5 hours)
Create commands/orchestrate.md:
1. Launch architect → wait for plan.md
2. Launch engineer with plan.md path → wait for code/
3. Launch reviewer with plan.md and code/ paths → wait for review.md
4. Check review verdict:
- APPROVED → proceed
- CHANGES_REQUIRED → re-run engineer (max 2 times) → reviewer again
- Still failing after 2 iterations → escalate to human
5. Track metadata for learningWeek 4: Track Quality Metrics (2-3 hours)
After each feature, capture:
{
“architect_confidence”: 0.90,
“sacred_rule_violations”: 0,
“sacred_taste_suggestions”: 3,
“review_verdict”: “APPROVED”,
“refactoring_required”: “minimal”,
“time_saved_vs_baseline”: “35 minutes”
}After 10 features, patterns emerge:
Which Sacred Rules are violated most?
Which skills need better examples?
Which agents have low confidence?
What’s the quality trend?
Use this data to improve your skills and agent prompts.
What’s Coming Next
This article introduced Agentic Engineering - the four-layer architecture for building AI teams that produce senior-level output. The purpose of this series is not to say “this is the one ring to rule them all”, but rather to share what is working for me, that might also work for you.
The next articles in this series go deeper:
Article 2: “Agent Design - Specialization Over Intelligence”
Designing agents with clear boundaries
Authority patterns (input vs output)
Why restrictions produce better code
Article 3: “Skills - Institutional Knowledge for AI Teams”
Sacred Rules vs Sacred Taste in depth
Progressive disclosure patterns
Building reference documentation
Skill evolution strategies
Article 4: “Orchestration - Coordinating Specialists”
Revision loop patterns
Escalation to humans
Batch vs interactive modes
Cost tracking and analysis
Article 5: “Metadata - The Learning Layer”
Quality metrics that matter
Learning from patterns
Continuous improvement cycles
Cost and quality tracking
The Real Transformation
Here’s what I wish I’d known when I started:
Building high-quality agentic systems isn’t about better prompts or bigger models. It’s about better architecture and clear knowledge boundaries.
The agents I build now produce code that senior engineers approve with minimal changes. Not because the models improved. Because the structure improved.
The context window will always be a problem, regardless of its size. LLMs are good at remembering the first and last part of the context, while being fuzzy in the middle. Reduce the middle as much as possible.
Agentic Engineering is that structure.
The four layers - Orchestration, Specialization, Knowledge, Learning - give you a framework to:
Build teams instead of generalists
Encode institutional knowledge
Learn from every execution
Produce senior-level output consistently
Most importantly: your agents will write code you trust.
Start Building
I spent over a year learning these lessons. I built numerous agentic systems. I refined the architecture. I identified the patterns that produce quality.
You don’t have to.
The framework is here. The patterns are proven. The architecture works.
When you build your first four-layer agentic team, you’ll understand why this approach works. Not from theory. From reading AI-generated code that looks like a senior engineer wrote it.
That’s the transformation. From AI as junior developer to AI as senior team.
Your agents can write senior-level code Monday. Not “eventually.” Monday.
Start with one feature. Split the agent. Add skills. Deploy it.
When the code quality jumps—and it will—build the next one.
That’s how Agentic Engineering spreads. One quality feature at a time.
Next:
Quick Reference
The Four Layers:
Orchestration - Coordinates specialists, enforces workflow
Specialization - Domain experts with isolated contexts
Knowledge - Sacred Rules + Sacred Taste
Learning - Quality metrics and continuous improvement
Key Principles:
Subagent isolation (single responsibility)
File-based communication (clear contracts)
Revision loops with limits (escalation to humans)
Deterministic context (convention over configuration)
Structure over content (orchestrator boundaries)
Progressive disclosure (load what you need)
Metadata as learning (track quality)
Quality Indicators:
Sacred Rule violations near zero
Code passes review with minor changes
Minimal refactoring required
Handles edge cases proactively
Follows established patterns
Includes comprehensive tests
Start Here:
Split your monolithic agent into specialists
Isolate contexts (file-based communication)
Extract Sacred Rules into skills
Track quality metrics
Ready to dive deeper? Next article covers agent design patterns and specialization strategies.
Found this useful? The best way to understand Agentic Engineering is to build with it. Start with one feature. Add structure. Watch the quality transform.

