Agentic Engineering: Skill Design
Building Institutional Knowledge Your Agents Actually Use
The Degradation Problem
Your agents start strong. First feature: excellent code. Clean patterns. Proper error handling. You’re impressed.
Third feature: different story. Missing edge cases. Violating conventions you established two days ago. Asking questions you already answered.
Fifth feature: back to junior level. The agent forgot everything.
This isn’t model failure. This is knowledge failure.
Why agents degrade:
Forgetting rules - Agent applied BR-08 (eager loading) perfectly on Tuesday. Friday’s code has N+1 queries everywhere. The knowledge didn’t stick. It was never encoded properly.
Context dilution - Your system prompt started at 800 tokens. Added error handling guidelines (200 tokens). Added security patterns (300 tokens). Added performance rules (400 tokens). Now it’s 1,700 tokens. Agent skims it. Focuses on the task description. Ignores most guidelines.
Inconsistent application - Agent prevents N+1 queries in one controller. Creates three in the next. No systematic check. No validation. Knowledge exists but isn’t reliably applied.
Relearning the same lessons - “Use Current.user for authorization.” Agent applies it. Next feature: forgets. You explain again. Next feature: forgets again. Every feature is a fresh start. No learning loop.
This pattern is structural. Not exceptional.
Most teams experience it within 5-10 features. The excitement of “AI writes code!” becomes the frustration of “Why am I teaching the same patterns every time?”
After building numerous agent systems, a pattern emerged: Agents degrade not because they can’t learn, but because we don’t provide knowledge in a form they can retain and retrieve.
What If Knowledge Didn’t Degrade?
Not perfect retention...what retention is perfect? But institutional retention:
Patterns learned once, applied consistently
Rules validated automatically, violations caught immediately
Standards encoded durably, not repeated manually
Quality improving over time, not degrading
Most agent systems fail at this not because of model limitations, but because of knowledge architecture.
We treat knowledge as prompts. Unstructured. Unreferenceable. Non-evolvable.
The fix isn’t better prompts. It’s better knowledge structure.
What a Skill Actually Is
Before we go further, let’s be precise.
A skill is NOT:
A prompt (ephemeral, unstructured)
A blob of text (non-navigable)
Documentation (passive, not actionable)
A collection of tips (no prioritization, no validation)
A skill IS:
Structured knowledge - Clear hierarchy (rules > taste > patterns)
Navigable - Agent can find what it needs
Referenceable - Specific rules have specific identifiers (BR-01, not “that security thing”)
Loadable - Agent loads on-demand, not all-at-once
Evolvable - Skills improve as you discover new patterns
Versioned - Changes tracked, rollback possible
This distinction is critical.
When knowledge is structured this way, it becomes retrievable. Agents don’t forget BR-08 because BR-08 is a durable reference, not a paragraph buried in 2,000 tokens.
When knowledge is navigable, agents apply it consistently. They know where to look. They load the navigation. They find the relevant rule. They apply it.
When knowledge is evolvable, your system learns. A new mistake becomes a new Sacred Rule. Skills improve. Quality compounds.
This is the difference between knowledge that degrades and knowledge that sticks.
The Sacred Rules vs Sacred Taste Distinction
Most agent systems conflate two fundamentally different types of knowledge:
Things that MUST be followed (or the system breaks)
Things that SHOULD be followed (or quality suffers)
Conflating these creates noisy agents.
Why the Separation Matters
Without distinction:
Agent sees 47 guidelines in the prompt. All written with equal emphasis. “Use params.expect()”, “Methods should be short”, “Prevent N+1 queries”, “Use descriptive names”.
Agent can’t prioritize. Treats everything equally. Or ignores everything equally.
Validator checks violations. Finds 12. Reports all. 3 are critical (N+1 queries). 9 are suggestions (method length). All weighted the same in the report.
You review the report. Noise. You can’t tell what’s blocking vs what’s nice-to-have. You fix half, ship the rest. Critical bugs ship. Quality degrades.
With distinction:
Agent sees Sacred Rules (MUST follow). Agent sees Sacred Taste (SHOULD follow). Clear priority.
Agent implements feature. Applies Sacred Rules during implementation. Validates with automated checks. Self-verifies before submitting.
Validator checks Sacred Rules. Finds 0.3 violations on average (down from 4-5). Reports them as BLOCKING. Clear signal.
Validator checks Sacred Taste. Finds 3 suggestions. Reports them as NON-BLOCKING. You review suggestions. Accept 2, skip 1. Ship confidently.
The difference: signal vs noise.
Why Validators Depend on This
Your validation agent checks code quality. Without Sacred Rules vs Taste:
Validation report:
Issues Found: 12
1. N+1 query in UsersController#index
2. Method PostsController#create is 18 lines (prefer ≤15)
3. Missing authorization check in Article#destroy
4. Variable name ‘x’ is unclear
5. No test for error case
...What’s blocking? What ships? You decide manually. Every time.
With Sacred Rules vs Taste:
Validation report:
BLOCKING ISSUES (Sacred Rules): 2
BR-08: N+1 query in UsersController#index
BR-13: Missing authorization check in Article#destroy
Ship when these are fixed.
---
NON-BLOCKING SUGGESTIONS (Sacred Taste): 3
BT-01: Method PostsController#create is 18 lines (prefer ≤15)
BT-04: Variable ‘x’ could be more descriptive
FT-03: Consider extracting inline styles to CSS
Address during refactoring phase.Now validation is automated. Blocking issues stop the pipeline. Suggestions inform improvements. Human judgment applied to taste, not rules.
This is why the distinction is the backbone of your quality system.
Why Conflating Creates Noisy Agents
Agent without distinction:
System: Follow all these guidelines...
[47 items, all stated equally]
Agent: *implements feature*
Agent: Did I miss anything?
Agent: *checks 47 items mentally*
Agent: Probably?
Agent: *asks user for confirmation*Agent with distinction:
System: Sacred Rules (MUST follow): 8 items. Sacred Taste (SHOULD follow): 12 items.
Agent: *implements feature*
Agent: *loads Sacred Rules*
Agent: *validates against each rule*
Agent: *self-verifies with provided commands*
Agent: Sacred Rules verified. Submitting.First agent: uncertain, asks questions, noisy.
Second agent: confident, validates systematically, quiet.
The mechanism: clear priorities enable autonomous verification.
Progressive Disclosure: Why Most Systems Fail
Here’s where most agent systems break down.
The typical approach: dump everything in context.
Typical agent context composition:
- System prompt (all rules embedded): ~15-20%
- Codebase files (5-10 files): ~60-70%
- Task description: ~2-5%
- Tool results and history: ~15-20%Your system prompt with embedded guidelines sits in the first 15-20% of context. The codebase occupies the middle 60-70%. The task and recent history occupy the end.
This destroys clarity through three mechanisms:
1. Cognitive Load Exceeds Working Memory
Humans have ~7±2 items in working memory. LLMs have analogous limits in effective attention.
When you embed 47 guidelines in your system prompt (15-20% of context), then add codebase files (60-70% of context), then add task description, the agent can’t hold it all in effective attention.
It focuses on what seems immediately relevant. The task (end of context). The current file (recent in context). The system prompt? Skimmed.
Your guidelines? Buried in the first 20% of total context. The load exceeds capacity.
2. LLM Middle-Context Degradation
LLMs are good at remembering:
The beginning of context (system prompt opening - strong attention)
The end of context (task description, user message - strong recency)
NOT the middle (where attention degrades significantly)
In a typical context window:
First 5-10%: Strong attention (system prompt opening)
Middle 60-80%: Degraded attention (this is where your guidelines and codebase live)
Final 10-15%: Strong attention (task, recent history)
Your embedded guidelines? They’re at 5-15% from the beginning. Right where middle-context degradation begins.
Agent remembers: “You are a senior engineer” (beginning). Remembers: “Implement user authentication” (end).
Forgets: “Always use params.expect(), prevent N+1 queries with eager loading, use Current.user for authorization” (middle).
This is architectural. Not model-specific. All transformer-based models exhibit this pattern. It’s why RAG exists.
3. Retrieval vs Injection Dynamics
Two ways to provide knowledge. The difference is when you load it.
Injection (typical approach) - Flow over time:
T0: System prompt loads (~15% of eventual context - includes all 47 rules)
Context: 15% full
T1: Agent reads UsersController
Context: 25% full
Rules: At 10% distance from current position
But agent hasn’t needed them yet
T2: Agent reads User model
Context: 35% full
Rules: At 20% distance from current position
T3: Agent reads 3 more files
Context: 65% full
Rules: At 50% distance - middle-context degradation zone
T4: Agent needs to implement params handling
Context: 65% full
Rules: BR-01 is buried 50% back in context
Result: Agent forgets or misapplies the ruleRules position when needed: 50%+ back in context (middle-attention degradation zone)
Relevance: 47 rules loaded, 3 needed (6%)
Retrieval (progressive disclosure) - Flow over time:
T0: System prompt loads (~3% of eventual context - minimal, points to skills)
Context: 3% full
T1: Agent reads task, loads skills navigation (~1% additional)
Context: 4% full
Knows: BR-01 exists, BR-08 exists, BT-01 exists
Doesn’t load details yet
T2: Agent reads UsersController
Context: 15% full
T3: Agent reads User model
Context: 25% full
T4: Agent identifies params handling needed
Loads BR-01-params-expect.md (~1% additional)
Context: 26% full
Rules: BR-01 is 0% back - just loaded
Result: Agent applies rule correctly
T5: Agent reads 3 more files
Context: 55% full
BR-01 still fresh (within 30% distance)
T6: Agent identifies query optimization needed
Loads BR-08-prevent-n-plus-1.md (~1% additional)
Context: 56% full
Rules: BR-08 is 0% back - just loaded
Result: Agent applies eager loadingRules position when needed: 0-5% back in context (strong-attention zone)
Relevance: 3 rules loaded, 3 needed (100%)
The critical difference:
With injection, knowledge is fixed at the beginning (15% of context). By the time it’s needed, it’s buried 50%+ back. Middle-context degradation.
With retrieval, knowledge is loaded at decision points. When BR-08 is needed, it’s loaded fresh (0% distance). Applied immediately. No degradation.
Comparison:
Context dedicated to skills: 15% → 3-4% (75% reduction)
Temporal relevance: 50% distance → 0% distance (immediate)
The transformation isn’t token efficiency. It’s temporal relevance. Knowledge loaded when needed. Fresh in context at the moment of application.
Skill Evolution: The Learning Loop
Static doctrine fails over time. Requirements change. New patterns emerge. Old rules become obsolete.
Skills must evolve. Here’s how:
1. Metadata Reveals Recurring Mistakes
After every feature, you capture metadata:
{
“feature_id”: “S-023”,
“sacred_rule_violations”: [
{”rule”: “BR-08”, “file”: “posts_controller.rb”, “line”: 42}
],
“sacred_taste_violations”: [
{”taste”: “BT-01”, “file”: “user.rb”, “method”: “calculate_score”}
],
“agent_questions”: 2,
“revision_cycles”: 1
}After 20 features, you analyze:
BR-08 violations: 12 occurrences across 20 features
BT-01 violations: 8 occurrences
Agent questions about error handling: 15 occurrencesThe pattern emerges: error handling isn’t documented well enough.
2. Mistakes Become New Sacred Rules
You notice agents consistently miss error handling in background jobs. It’s not in the rules. It’s implicit knowledge.
You formalize it:
# BR-14: Background Job Error Handling
**Category:** Sacred Rule
**Severity:** High
**Applies To:** Jobs
## Rule
ALL background jobs MUST handle exceptions explicitly.
## Rationale
Silent failures in background jobs are invisible to users.
They cause data inconsistency without user notification.
## Incorrect
```ruby
class ProcessPaymentJob < ApplicationJob
def perform(order_id)
order = Order.find(order_id)
PaymentProcessor.charge(order)
end
end
```
Silent failure if PaymentProcessor raises exception.
## Correct
```ruby
class ProcessPaymentJob < ApplicationJob
retry_on PaymentError, wait: 5.minutes, attempts: 3
def perform(order_id)
order = Order.find(order_id)
PaymentProcessor.charge(order)
rescue PaymentError => e
order.mark_payment_failed!(e.message)
raise # Retry via retry_on
rescue => e
order.mark_payment_failed!(”Unknown error”)
ErrorLogger.report(e)
# Don’t retry unknown errors
end
end
```
Explicit handling. User notified. Errors logged.
## Validation
```bash
grep -r “class.*Job.*ApplicationJob” app/jobs/ | \
xargs grep -L “rescue\|retry_on” && \
echo “Jobs without error handling found” || echo “OK”
```Add to the navigation (SKILL.md):
## Sacred Rules
- [BR-14: Job error handling](references/BR-14-job-error-handling.md) - Explicit rescueNext feature with background jobs: Agent loads BR-14. Applies it. No mistakes. Pattern learned.
3. Rules Become Versioned
Six months later, Rails 9 changes job handling patterns. BR-14 needs updating.
You don’t modify BR-14. You create BR-14v2:
# BR-14v2: Background Job Error Handling (Rails 9+)
**Category:** Sacred Rule
**Severity:** High
**Supersedes:** BR-14 (Rails 8)
[Updated patterns for Rails 9]Old projects still reference BR-14. New projects use BR-14v2. No breaking changes. Explicit evolution.
4. Skills Improve Over Time
The learning loop:
Feature N → Metadata captured → Patterns analyzed →
New rule created → Navigation updated → Feature N+1 applies rule →
No violation → Pattern learnedAfter 50 features:
Sacred Rules: 8 → 15 (7 added from discovered patterns)
Violations per feature: 4.5 → 0.5 (~90% reduction)
Questions per feature: 8 → 1 (87% reduction)
Agent confidence: 0.72 → 0.91 (subjective but tracked)
Skills aren’t static documentation. They’re adaptive knowledge systems.
Quality doesn’t degrade. It compounds.
Skills as Organizational Memory
Zoom out. This isn’t just about AI.
The Institutional Knowledge Problem
Companies lose knowledge when people leave.
Senior engineer departs. Takes with them:
Why we chose this architecture
Which patterns caused bugs before
What optimizations actually worked
Where the edge cases hide
New engineer arrives. Relearns everything. Makes the same mistakes. Team knowledge resets.
This problem is older than software. Organizations have struggled with knowledge transfer for centuries.
Skills as Durable Memory
Skills preserve institutional knowledge in retrievable form.
When senior engineer leaves:
Their patterns are encoded in Sacred Rules
Their quality standards are encoded in Sacred Taste
Their debugging wisdom is encoded in validation commands
Their architectural decisions are documented with rationale
New engineer arrives:
Loads skills
Sees what the team values (Sacred Rules vs Taste)
Understands why (rationale sections)
Applies patterns immediately
Validates automatically
No six-month ramp-up relearning tribal knowledge. Knowledge is durably encoded.
The Bridge to Organizational Theory
This connects three domains:
AI System Design:
How agents retain knowledge
How context is managed
How quality is verified
Organizational Theory:
How institutions preserve knowledge
How culture is transmitted
How standards are maintained
Knowledge Management:
How tacit knowledge becomes explicit
How expertise is codified
How learning compounds
Skills aren’t just an AI pattern. They’re an organizational pattern that happens to work exceptionally well for AI.
When you build skills, you’re solving the same problem companies have struggled with forever: how do we preserve what we learn?
The difference: with skills, the knowledge is machine-readable. Agents can load it. Validate against it. Apply it systematically.
Humans benefit too. New team members read the skills. Understand team standards. See examples. Learn faster.
Skills become your organization’s durable memory. Surviving beyond any individual. Improving over time. Compounding with each learned lesson.
The Structure (Implementation)
Now that you understand the doctrine, here’s the structure:
Directory Layout
skills/[domain]/
SKILL.md # Navigation (~80-100 lines)
references/
[RULE-01]-name.md # Sacred Rule detail
[TASTE-01]-name.md # Sacred Taste detail
[pattern].md # Pattern referenceNavigation File Template
---
name: [domain]-skill
description: [Domain] knowledge - [scope]
---
# [Domain] Skill
**Purpose:** Navigation to [domain] reference documentation
**Load:** This file by default (~80-100 lines)
**On-demand:** Load specific references as needed
---
## When to Use This Skill
**Load when working with:** [file patterns]
**Don’t load when working with:** [out of scope]
---
## Sacred Rules (MUST follow)
- [RULE-01: Name](references/RULE-01.md) - One-line description
- [RULE-02: Name](references/RULE-02.md) - One-line description
---
## Sacred Taste (SHOULD follow)
- [TASTE-01: Name](references/TASTE-01.md) - One-line description
- [TASTE-02: Name](references/TASTE-02.md) - One-line description
---
**Navigation complete. Load specific references as needed.**Rule File Template
# [RULE-ID]: [Rule Name]
**Category:** Sacred Rule | Sacred Taste
**Severity:** Critical | High | Medium | Low
**Applies To:** [Components]
## Rule
[Clear statement]
## Rationale
[Why this matters - technical/business/quality reason]
## Incorrect
```[language]
[Wrong example]
[Why wrong - specific consequences]
## Correct
[Right example]
[Why right - specific benefits]
## Validation
[Automated check command]
## Exceptions
[When doesn’t apply. If none: “No exceptions.”]
Naming Convention
Sacred Rules: [PREFIX]-[NN]-descriptive-name.md
Backend:
BR-01,BR-02, etc.Frontend:
FR-01,FR-02, etc.Marketing:
MR-01,MR-02, etc.
Sacred Taste: [PREFIX]T-[NN]-descriptive-name.md
Backend:
BT-01,BT-02, etc.Frontend:
FT-01,FT-02, etc.
The structure serves the doctrine. Not the reverse.
Real-World Results
My reference implementation: visionaire-rails-team
Domain: Rails web application development
Before skills (single-agent with large prompt):
System prompt: 4,800 tokens (loaded every invocation)
Sacred Rule violations: 4-5 per feature
Questions asked: 8-12 per feature
Knowledge retention: Zero (same violations repeated)
Token cost per feature: ~$0.85
With progressive disclosure skills:
Navigation load: 113 tokens
Average rule loads: 2-3 rules at ~200 tokens each = 400-600 tokens
Sacred Rule violations: 0.3 per feature (93% reduction)
Questions asked: 1-2 per feature (90% reduction)
Knowledge retention: High (patterns learned, applied consistently)
Token cost per feature: ~$0.55 (35% reduction)
What fundamentally changed:
Knowledge became retrievable. Agent didn’t forget BR-08 because BR-08 existed as a durable reference. Agent loaded it when working with queries. Applied it. Validated with provided command.
Quality didn’t degrade over time. It improved. Each new discovered pattern became a new rule. Skills evolved. System learned.
After 50 features:
New Sacred Rules added: 7 (discovered from metadata analysis)
Rules deprecated: 2 (superseded by framework changes)
Average violations trending: 0.3 → 0.1
Agent confidence trending: 0.72 → 0.91
Not static doctrine. Adaptive knowledge.
Beyond Software
The same skill structure works for any domain requiring institutional knowledge.
Marketing:
Sacred Rules: Brand compliance, tracking parameters, measurable KPIs
Sacred Taste: Headline length, active voice, tone consistency
Result: Campaigns that match brand, track correctly, engage effectively
Legal:
Sacred Rules: Flag liability clauses, verify jurisdiction, check IP rights
Sacred Taste: Plain language comments, prioritize high-risk items
Result: Analysis that catches what senior counsel catches
Medical:
Sacred Rules: Dosage verification, allergy checks, interaction warnings
Sacred Taste: Clear communication, empathy markers, documentation quality
Result: Clinical decisions that follow standards, communicate effectively
Same pattern: Structured knowledge → Durable memory → Consistent application → Improving quality
The Choice You’re Facing
Keep using unstructured prompts:
5,000-token system prompts that agents skim
Same violations every feature
Same questions every time
Knowledge that degrades
Quality that decays
Or build structured skills:
100-token navigation, 400-token just-in-time loading
Violations dropping from 4.5 to 0.3
Questions dropping from 8 to 1
Knowledge that sticks
Quality that compounds
The difference isn’t model capability. It’s knowledge architecture.
Management asks: “Why are we explaining the same patterns every time? Why isn’t the AI learning?”
The answer: Because knowledge isn’t structured for retention.
Skills solve this. Not through better prompts. Through better structure.
Getting Started
Start small. One critical area. Three Sacred Rules.
Week 1: Identify Pain Points
Which violations happen most?
Security issues? (params, authorization)
Performance problems? (N+1 queries)
Quality issues? (method length, complexity)
Pick your top 3. These become your first Sacred Rules.
Week 2: Create First Skill
Create navigation file (SKILL.md, ~80 lines)
Write three Sacred Rule files (wrong vs right examples)
Add validation commands where possible
Update agent prompt to load navigation before work
Week 3: Measure Impact
Run agent on task where it previously violated rules.
Compare:
Violations: Before vs After
Questions: Before vs After
Token usage: Before vs After
The improvement will be measurable within one week.
Month 2: Expand and Evolve
Add 5 more Sacred Rules (from discovered violations)
Add 3 Sacred Taste items (quality preferences)
Track metadata (violations, questions, patterns)
Analyze after 10 features
Create new rules from recurring issues
Month 3: Build Learning Loop
Automated violation tracking
Pattern analysis pipeline
Rule versioning system
Skill evolution workflow
By month 3, you have an adaptive knowledge system. Not static documentation. Not degrading prompts. Institutional memory that improves.
What’s Coming Next
This article covered the Skills layer - how to structure institutional knowledge that agents retain and apply.
Next in the Agentic Engineering series:
Article 4: “Orchestration - Coordinating Specialists”
How commands coordinate multi-agent workflows with revision loops and bounded retries.
Article 5: “Metadata - The Learning Layer”
How quality metrics reveal patterns and drive continuous improvement.
The Transformation
Building high-quality agent systems isn’t about dumping more knowledge into prompts. It’s about structured disclosure with clear priorities.
The agents I build now violate Sacred Rules 93% less than before. Not because the models improved. Because the knowledge structure improved.
Progressive disclosure solves the information overload problem.
Sacred Rules vs Taste solves the prioritization problem.
Skill evolution solves the learning problem.
Organizational memory solves the knowledge retention problem.
This is the Skills layer. The third layer in Agentic Engineering.
When you build your first progressive disclosure skill, you’ll understand why this works. Not from theory. From watching agents apply patterns consistently without being reminded.
That’s the transformation. From knowledge that degrades to knowledge that compounds.
Summary
Agents degrade not because they can’t learn, but because knowledge isn’t structured for retention.
Skills - structured, navigable, referenceable, evolvable knowledge - solve this through:
Progressive disclosure (load what’s needed when needed)
Sacred Rules vs Taste (separate MUST from SHOULD)
Validation automation (agents self-verify)
Evolution loops (mistakes become rules, quality compounds)
Organizational memory (knowledge survives individuals)
Results from visionaire-rails-team: 93% fewer violations, 87% fewer tokens, knowledge that sticks.
Doctrine: Clear priorities enable autonomous verification.
Structure: Navigation → Rules → Taste → Patterns.
Outcome: Quality that improves instead of degrading.
Next:
Previous:

