# TEL301: Six Essentials of Agentic AI
    https://www.telosready.com/skills/TEL301?v=9
    Framework for designing, building, evaluating, and improving agentic AI systems based on six essential components: Agentic Harness, Unit of Work, Workflows, Memory, Skills, and Oversight. Use this skill whenever someone is architecting an agentic system, evaluating whether an agent implementation is production-ready, designing an agentic development workflow, reviewing or auditing an existing agent's architecture, building a product that uses AI agents to do real work, or discussing what makes an agent different from a chatbot. Also trigger when users mention terms like 'agentic loop', 'agent architecture', 'AI agent design', 'agentic development', 'agent framework', or ask questions like 'what do I need to build an agent' or 'how do I make my agent production-ready'. This skill applies to both building agents AND doing development with agents — the same six essentials appear in both contexts.
    
    ## Instructions
    This framework defines the six components required to build agentic AI systems that ship real work. Each essential builds on the previous one — together they form a complete system. The core insight is: **the model is not the agent — the system is.**
    
    These essentials apply in two contexts:
    1. **Building agents** — designing systems where AI agents do work autonomously
    2. **Doing development with agents** — using agentic AI as a development tool (e.g. Claude Code)
    
    Both contexts require the same six components. When advising on agentic systems, evaluate all six and identify which are missing or underdeveloped.
    
    ---
    
    ## The Six Essentials
    
    ### 1. Agentic Harness
    
    The runtime that manages the agentic loop, tool access, caching, and context compaction. This is the most commonly underestimated component — people focus on the model when they should focus on the harness.
    
    **What it does:**
    - Runs the agentic loop: Plan → Act → Observe → Repeat
    - Controls which tools the agent can access and how they're invoked
    - Manages context window limits through caching and compaction
    - Handles error recovery and retry logic
    
    **Key design questions:**
    - How does the harness decide when to stop the loop?
    - What happens when a tool call fails?
    - How is the context window managed as work accumulates?
    - How are tool results cached to avoid redundant work?
    
    **Reference example:** Claude Code is an agentic harness for software development. It manages tool access (file edit, bash, web search), handles context compaction automatically, runs the plan-act-observe loop, and integrates MCP servers for extensibility.
    
    **Red flags when this is missing or weak:** The agent loses context mid-task, hits token limits unexpectedly, makes redundant tool calls, or gets stuck in loops without termination.
    
    ---
    
    ### 2. Unit of Work
    
    The container that gives the agent scope, persistence, and the ability to finish real work. This is what separates a chatbot from a system that actually completes tasks.
    
    **Spectrum of complexity:**
    
    | Simple | Production |
    |--------|------------|
    | Chat session | Ticket / Job / Task |
    | Starts and ends with conversation | Can span hours or days |
    | Context is temporary | State is persistent |
    | Work is ephemeral | Work is resumable, trackable, completable |
    
    **Key design questions:**
    - What defines the boundaries of one unit of work?
    - How does the agent know when the work is "done"?
    - Can work be paused and resumed?
    - How is progress tracked and reported?
    - What happens if the agent fails mid-unit?
    
    **The progression:** Most teams start with chat sessions, but production systems need a more durable container. A ticket in a project management system, a job in a queue, or a task in a workflow engine — these give the agent something concrete to work against and a clear definition of done.
    
    **Red flags when this is missing or weak:** The agent can't do work that takes more than one session, there's no way to track what the agent has done, work gets lost when sessions end, or there's no concept of "completion."
    
    ---
    
    ### 3. Workflows & Commands
    
    Predefined patterns that kick off the agentic loop with the right context. These are the playbooks that make agents repeatable and reliable rather than hoping the LLM figures out what to do from a vague prompt.
    
    **The three-step pattern:**
    
    1. **Trigger** — A command, event, or scheduled action initiates the workflow
    2. **Context** — The workflow loads relevant data, history, and constraints
    3. **Execute** — The agentic loop runs with clear goals and boundaries
    
    **Key design questions:**
    - What are the most common actions the agent needs to perform?
    - What context does each workflow need to load before starting?
    - How are workflows parameterised for different inputs?
    - Can users create their own workflows or only use predefined ones?
    - How do workflows compose (one workflow calling another)?
    
    **The insight:** Without workflows, you're hoping the LLM figures out what to do from a vague prompt. Workflows encode your team's best practices into repeatable automation. They're the difference between "ask the AI to help" and "run this process."
    
    **Red flags when this is missing or weak:** Every interaction starts from scratch, users get inconsistent results for the same type of task, there's no way to standardise common operations, or the agent requires extensive prompting to do routine work.
    
    ---
    
    ### 4. Memory
    
    Not just "remember stuff" — memory must be self-learning, self-managing, and properly scoped. This is what compounds the agent's value over time. Agents without memory start from zero every time.
    
    **Three required properties:**
    
    **Self-Learning** — The memory system automatically updates from work the agent does. Every completed task, every correction, every observed pattern feeds back into what the agent knows. This shouldn't require manual curation — the system should learn from its own work.
    
    **Self-Managing** — Memory needs to prune, prioritise, and organise itself. As the volume of remembered information grows, the system must decide what's still relevant, what can be compressed, and what should be forgotten. Unbounded memory becomes noise.
    
    **Properly Scoped** — Different contexts need different memories. A well-designed memory system distinguishes between:
    - **Personal** — individual user preferences and history
    - **Project** — context specific to a piece of work
    - **Organisation** — shared knowledge across the team or company
    - **Global** — general knowledge applicable everywhere
    
    **Key design questions:**
    - How does the agent learn from completed work without explicit instruction?
    - What triggers memory consolidation and cleanup?
    - How are scope boundaries enforced (preventing project A's context from leaking into project B)?
    - What's the retrieval strategy (vector search, structured lookup, hybrid)?
    - How do you handle contradictions between old and new information?
    
    **Red flags when this is missing or weak:** The agent asks the same questions repeatedly, doesn't improve at recurring tasks, bleeds context between unrelated projects, or requires users to manually maintain its knowledge base.
    
    ---
    
    ### 5. Skills
    
    Reusable, testable capabilities the agent draws on, with a built-in feedback loop for continuous improvement. This is where organisational knowledge gets encoded and where the system gets better with use rather than staying static.
    
    **Skill properties:**
    - **System-wide** — shared across the organisation, not locked to one user or project
    - **Versioned** — track changes and roll back when a skill regresses
    - **Testable** — validate against known scenarios before deploying
    - **Composable** — combine skills for complex tasks (a "write report" skill might use a "research" skill and a "format document" skill)
    - **Self-improving** — feedback from usage drives refinement
    
    **The continuous improvement loop:**
    
    1. **Deploy** — Ship the skill into the system
    2. **Observe** — Monitor how the agent uses it and what outcomes it produces
    3. **Evaluate** — Measure quality against success criteria
    4. **Refine** — Update the skill based on what was learned
    5. Return to step 1
    
    **Key design questions:**
    - How are skills discovered and loaded by the agent?
    - What's the mechanism for skill authors to publish and share?
    - How do you measure whether a skill is working well?
    - What prevents skill bloat (too many skills degrading selection quality)?
    - How do skills handle edge cases they weren't designed for?
    
    **The analogy:** Think of skills like packages in a package manager (npm, pip), but at the knowledge layer rather than the code layer. They're the reusable units that encode "how to do X well" and improve over time.
    
    **Red flags when this is missing or weak:** The agent is equally mediocre at everything, there's no way to encode domain expertise, improvements aren't captured for reuse, or the system doesn't get better with use.
    
    ---
    
    ### 6. Oversight
    
    The system's ability to evaluate its own outputs, surface uncertainty, escalate to humans, and feed corrections back into the other five essentials. This is what makes the difference between an agent you deploy and an agent you trust.
    
    **Three required capabilities:**
    
    **Evaluation** — Systematic assessment of whether the agent's work product meets quality criteria. Not just "did it finish" but "is this good." Evaluation can be automated (a second LLM pass grading the output, deterministic checks, test suites) or human (review queues, approval gates). The key is that evaluation is designed into the system, not left to chance.
    
    **Transparency** — The full trace of what the agent did and why: reasoning, tool calls, context used, decisions made. This isn't logging for debugging — it's the audit trail that makes everything else improvable. Without transparency, Memory can't self-learn accurately, Skills can't self-improve meaningfully, and Workflows can't be refined with confidence.
    
    **Escalation** — The agent's ability to recognise its own limits and involve a human at the right moment. This requires confidence signalling (the agent knows when it's uncertain), defined escalation paths (who gets asked, through what channel), and graceful handoff (the human receives full context, not just a vague flag). The critical design principle: the agent should default to escalating when uncertain, not default to guessing.
    
    **The feedback circuit:**
    
    Oversight is the only essential with a bidirectional relationship to the rest of the stack. Its outputs feed downward into every layer:
    
    - Evaluation results refine **Skills** — a skill that consistently produces poor-quality output gets flagged for revision
    - Trace data improves **Workflows** — you can see exactly where a workflow's context-loading step is insufficient
    - Escalation patterns inform **Memory** — repeated escalations on the same topic signal a knowledge gap
    - Quality metrics tune the **Harness** — too many tool calls, excessive token spend, or loops that run too long
    - Completion quality redefines what "done" means for the **Unit of Work**
    
    Without this circuit, the self-improving and self-learning properties described in Skills and Memory are promises without plumbing.
    
    **Key design questions:**
    - What does "good output" look like for each type of work, and how is that measured?
    - Who evaluates — a second model, deterministic checks, a human, or some combination?
    - What's the confidence threshold below which the agent must escalate?
    - How fast does correction feedback reach the component that needs it?
    - What's the cost of evaluation relative to the cost of the work itself?
    - How do you prevent evaluation from becoming a bottleneck that kills throughput?
    
    **Red flags when this is missing or weak:** The team can't tell you how often the agent gets it right. Mistakes are discovered by end users, not by the system. Self-improvement is described in the architecture but nobody can point to a concrete example of it happening. Humans are either reviewing everything (defeating the purpose) or reviewing nothing (accepting unknown risk). The agent never says "I'm not sure."
    
    ---
    
    ## How They Fit Together
    
    The six essentials form a stack, each building on the one below:
    
    ```
    ┌──────────────────────────────────────┐
    │  Oversight — Know if it's working    │
    ├──────────────────────────────────────┤
    │  Skills — Give it capability         │
    ├──────────────────────────────────────┤
    │  Memory — Give it context            │
    ├──────────────────────────────────────┤
    │  Workflows — Tell it what to do      │
    ├──────────────────────────────────────┤
    │  Unit of Work — Give it scope        │
    ├──────────────────────────────────────┤
    │  Harness — Run the loop              │
    └──────────────────────────────────────┘
    ```
    
    The harness runs the loop. The unit of work gives it boundaries. Workflows tell it what to do. Memory gives it context. Skills give it capability. Oversight sits on top — observing the outputs of everything below it and feeding corrections back down through every layer. The first five essentials build an agent that can work. The sixth builds an agent you can stand behind.
    
    ---
    
    ## Using This Framework
    
    ### For Architecture Reviews
    
    When reviewing an agentic system, evaluate each of the six essentials on a maturity scale:
    
    - **Missing** — Not present at all
    - **Ad hoc** — Present but informal, inconsistent, or manual
    - **Defined** — Deliberately designed with clear interfaces
    - **Managed** — Monitored, measured, and actively maintained
    - **Optimising** — Self-improving with feedback loops
    
    A system doesn't need all five at "Optimising" to be useful, but any essential at "Missing" is a significant gap. Start by getting everything to "Defined" — that's where most of the value unlocks.
    
    ### For New Projects
    
    Start with the harness and unit of work — these are the foundation. You can build a useful system with just these two (many chat-based agents operate here). Add workflows when you find yourself repeatedly setting up the same context. Add memory when you notice the agent re-learning things it should already know. Add skills when you have domain expertise worth encoding for reuse. Add oversight when the cost of getting it wrong — in quality, trust, or risk — justifies systematic evaluation. In practice, that means most production systems.
    
    ### For Evaluating Tools and Platforms
    
    When evaluating agentic AI tools or platforms, use the six essentials as a checklist. Most tools are strong on the harness and weak on everything else. The differentiation happens in how well they handle units of work, workflows, memory, skills, and oversight.
    
    See [references/evaluation-checklist.md](references/evaluation-checklist.md) for a detailed evaluation template.
    
    ← Skills Directory
    TEL301

    Six Essentials of Agentic AI

    Framework for designing, building, evaluating, and improving agentic AI systems based on six essential components: Agentic Harness, Unit of Work, Workflows, Memory, Skills, and Oversight. Use this skill whenever someone is architecting an agentic system, evaluating whether an agent implementation is production-ready, designing an agentic development workflow, reviewing or auditing an existing agent's architecture, building a product that uses AI agents to do real work, or discussing what makes an agent different from a chatbot. Also trigger when users mention terms like 'agentic loop', 'agent architecture', 'AI agent design', 'agentic development', 'agent framework', or ask questions like 'what do I need to build an agent' or 'how do I make my agent production-ready'. This skill applies to both building agents AND doing development with agents — the same six essentials appear in both contexts.

    This framework defines the six components required to build agentic AI systems that ship real work. Each essential builds on the previous one — together they form a complete system. The core insight is: **the model is not the agent — the system is.**
    
    These essentials apply in two contexts:
    1. **Building agents** — designing systems where AI agents do work autonomously
    2. **Doing development with agents** — using agentic AI as a development tool (e.g. Claude Code)
    
    Both contexts require the same six components. When advising on agentic systems, evaluate all six and identify which are missing or underdeveloped.
    
    ---
    
    ## The Six Essentials
    
    ### 1. Agentic Harness
    
    The runtime that manages the agentic loop, tool access, caching, and context compaction. This is the most commonly underestimated component — people focus on the model when they should focus on the harness.
    
    **What it does:**
    - Runs the agentic loop: Plan → Act → Observe → Repeat
    - Controls which tools the agent can access and how they're invoked
    - Manages context window limits through caching and compaction
    - Handles error recovery and retry logic
    
    **Key design questions:**
    - How does the harness decide when to stop the loop?
    - What happens when a tool call fails?
    - How is the context window managed as work accumulates?
    - How are tool results cached to avoid redundant work?
    
    **Reference example:** Claude Code is an agentic harness for software development. It manages tool access (file edit, bash, web search), handles context compaction automatically, runs the plan-act-observe loop, and integrates MCP servers for extensibility.
    
    **Red flags when this is missing or weak:** The agent loses context mid-task, hits token limits unexpectedly, makes redundant tool calls, or gets stuck in loops without termination.
    
    ---
    
    ### 2. Unit of Work
    
    The container that gives the agent scope, persistence, and the ability to finish real work. This is what separates a chatbot from a system that actually completes tasks.
    
    **Spectrum of complexity:**
    
    | Simple | Production |
    |--------|------------|
    | Chat session | Ticket / Job / Task |
    | Starts and ends with conversation | Can span hours or days |
    | Context is temporary | State is persistent |
    | Work is ephemeral | Work is resumable, trackable, completable |
    
    **Key design questions:**
    - What defines the boundaries of one unit of work?
    - How does the agent know when the work is "done"?
    - Can work be paused and resumed?
    - How is progress tracked and reported?
    - What happens if the agent fails mid-unit?
    
    **The progression:** Most teams start with chat sessions, but production systems need a more durable container. A ticket in a project management system, a job in a queue, or a task in a workflow engine — these give the agent something concrete to work against and a clear definition of done.
    
    **Red flags when this is missing or weak:** The agent can't do work that takes more than one session, there's no way to track what the agent has done, work gets lost when sessions end, or there's no concept of "completion."
    
    ---
    
    ### 3. Workflows & Commands
    
    Predefined patterns that kick off the agentic loop with the right context. These are the playbooks that make agents repeatable and reliable rather than hoping the LLM figures out what to do from a vague prompt.
    
    **The three-step pattern:**
    
    1. **Trigger** — A command, event, or scheduled action initiates the workflow
    2. **Context** — The workflow loads relevant data, history, and constraints
    3. **Execute** — The agentic loop runs with clear goals and boundaries
    
    **Key design questions:**
    - What are the most common actions the agent needs to perform?
    - What context does each workflow need to load before starting?
    - How are workflows parameterised for different inputs?
    - Can users create their own workflows or only use predefined ones?
    - How do workflows compose (one workflow calling another)?
    
    **The insight:** Without workflows, you're hoping the LLM figures out what to do from a vague prompt. Workflows encode your team's best practices into repeatable automation. They're the difference between "ask the AI to help" and "run this process."
    
    **Red flags when this is missing or weak:** Every interaction starts from scratch, users get inconsistent results for the same type of task, there's no way to standardise common operations, or the agent requires extensive prompting to do routine work.
    
    ---
    
    ### 4. Memory
    
    Not just "remember stuff" — memory must be self-learning, self-managing, and properly scoped. This is what compounds the agent's value over time. Agents without memory start from zero every time.
    
    **Three required properties:**
    
    **Self-Learning** — The memory system automatically updates from work the agent does. Every completed task, every correction, every observed pattern feeds back into what the agent knows. This shouldn't require manual curation — the system should learn from its own work.
    
    **Self-Managing** — Memory needs to prune, prioritise, and organise itself. As the volume of remembered information grows, the system must decide what's still relevant, what can be compressed, and what should be forgotten. Unbounded memory becomes noise.
    
    **Properly Scoped** — Different contexts need different memories. A well-designed memory system distinguishes between:
    - **Personal** — individual user preferences and history
    - **Project** — context specific to a piece of work
    - **Organisation** — shared knowledge across the team or company
    - **Global** — general knowledge applicable everywhere
    
    **Key design questions:**
    - How does the agent learn from completed work without explicit instruction?
    - What triggers memory consolidation and cleanup?
    - How are scope boundaries enforced (preventing project A's context from leaking into project B)?
    - What's the retrieval strategy (vector search, structured lookup, hybrid)?
    - How do you handle contradictions between old and new information?
    
    **Red flags when this is missing or weak:** The agent asks the same questions repeatedly, doesn't improve at recurring tasks, bleeds context between unrelated projects, or requires users to manually maintain its knowledge base.
    
    ---
    
    ### 5. Skills
    
    Reusable, testable capabilities the agent draws on, with a built-in feedback loop for continuous improvement. This is where organisational knowledge gets encoded and where the system gets better with use rather than staying static.
    
    **Skill properties:**
    - **System-wide** — shared across the organisation, not locked to one user or project
    - **Versioned** — track changes and roll back when a skill regresses
    - **Testable** — validate against known scenarios before deploying
    - **Composable** — combine skills for complex tasks (a "write report" skill might use a "research" skill and a "format document" skill)
    - **Self-improving** — feedback from usage drives refinement
    
    **The continuous improvement loop:**
    
    1. **Deploy** — Ship the skill into the system
    2. **Observe** — Monitor how the agent uses it and what outcomes it produces
    3. **Evaluate** — Measure quality against success criteria
    4. **Refine** — Update the skill based on what was learned
    5. Return to step 1
    
    **Key design questions:**
    - How are skills discovered and loaded by the agent?
    - What's the mechanism for skill authors to publish and share?
    - How do you measure whether a skill is working well?
    - What prevents skill bloat (too many skills degrading selection quality)?
    - How do skills handle edge cases they weren't designed for?
    
    **The analogy:** Think of skills like packages in a package manager (npm, pip), but at the knowledge layer rather than the code layer. They're the reusable units that encode "how to do X well" and improve over time.
    
    **Red flags when this is missing or weak:** The agent is equally mediocre at everything, there's no way to encode domain expertise, improvements aren't captured for reuse, or the system doesn't get better with use.
    
    ---
    
    ### 6. Oversight
    
    The system's ability to evaluate its own outputs, surface uncertainty, escalate to humans, and feed corrections back into the other five essentials. This is what makes the difference between an agent you deploy and an agent you trust.
    
    **Three required capabilities:**
    
    **Evaluation** — Systematic assessment of whether the agent's work product meets quality criteria. Not just "did it finish" but "is this good." Evaluation can be automated (a second LLM pass grading the output, deterministic checks, test suites) or human (review queues, approval gates). The key is that evaluation is designed into the system, not left to chance.
    
    **Transparency** — The full trace of what the agent did and why: reasoning, tool calls, context used, decisions made. This isn't logging for debugging — it's the audit trail that makes everything else improvable. Without transparency, Memory can't self-learn accurately, Skills can't self-improve meaningfully, and Workflows can't be refined with confidence.
    
    **Escalation** — The agent's ability to recognise its own limits and involve a human at the right moment. This requires confidence signalling (the agent knows when it's uncertain), defined escalation paths (who gets asked, through what channel), and graceful handoff (the human receives full context, not just a vague flag). The critical design principle: the agent should default to escalating when uncertain, not default to guessing.
    
    **The feedback circuit:**
    
    Oversight is the only essential with a bidirectional relationship to the rest of the stack. Its outputs feed downward into every layer:
    
    - Evaluation results refine **Skills** — a skill that consistently produces poor-quality output gets flagged for revision
    - Trace data improves **Workflows** — you can see exactly where a workflow's context-loading step is insufficient
    - Escalation patterns inform **Memory** — repeated escalations on the same topic signal a knowledge gap
    - Quality metrics tune the **Harness** — too many tool calls, excessive token spend, or loops that run too long
    - Completion quality redefines what "done" means for the **Unit of Work**
    
    Without this circuit, the self-improving and self-learning properties described in Skills and Memory are promises without plumbing.
    
    **Key design questions:**
    - What does "good output" look like for each type of work, and how is that measured?
    - Who evaluates — a second model, deterministic checks, a human, or some combination?
    - What's the confidence threshold below which the agent must escalate?
    - How fast does correction feedback reach the component that needs it?
    - What's the cost of evaluation relative to the cost of the work itself?
    - How do you prevent evaluation from becoming a bottleneck that kills throughput?
    
    **Red flags when this is missing or weak:** The team can't tell you how often the agent gets it right. Mistakes are discovered by end users, not by the system. Self-improvement is described in the architecture but nobody can point to a concrete example of it happening. Humans are either reviewing everything (defeating the purpose) or reviewing nothing (accepting unknown risk). The agent never says "I'm not sure."
    
    ---
    
    ## How They Fit Together
    
    The six essentials form a stack, each building on the one below:
    
    ```
    ┌──────────────────────────────────────┐
    │  Oversight — Know if it's working    │
    ├──────────────────────────────────────┤
    │  Skills — Give it capability         │
    ├──────────────────────────────────────┤
    │  Memory — Give it context            │
    ├──────────────────────────────────────┤
    │  Workflows — Tell it what to do      │
    ├──────────────────────────────────────┤
    │  Unit of Work — Give it scope        │
    ├──────────────────────────────────────┤
    │  Harness — Run the loop              │
    └──────────────────────────────────────┘
    ```
    
    The harness runs the loop. The unit of work gives it boundaries. Workflows tell it what to do. Memory gives it context. Skills give it capability. Oversight sits on top — observing the outputs of everything below it and feeding corrections back down through every layer. The first five essentials build an agent that can work. The sixth builds an agent you can stand behind.
    
    ---
    
    ## Using This Framework
    
    ### For Architecture Reviews
    
    When reviewing an agentic system, evaluate each of the six essentials on a maturity scale:
    
    - **Missing** — Not present at all
    - **Ad hoc** — Present but informal, inconsistent, or manual
    - **Defined** — Deliberately designed with clear interfaces
    - **Managed** — Monitored, measured, and actively maintained
    - **Optimising** — Self-improving with feedback loops
    
    A system doesn't need all five at "Optimising" to be useful, but any essential at "Missing" is a significant gap. Start by getting everything to "Defined" — that's where most of the value unlocks.
    
    ### For New Projects
    
    Start with the harness and unit of work — these are the foundation. You can build a useful system with just these two (many chat-based agents operate here). Add workflows when you find yourself repeatedly setting up the same context. Add memory when you notice the agent re-learning things it should already know. Add skills when you have domain expertise worth encoding for reuse. Add oversight when the cost of getting it wrong — in quality, trust, or risk — justifies systematic evaluation. In practice, that means most production systems.
    
    ### For Evaluating Tools and Platforms
    
    When evaluating agentic AI tools or platforms, use the six essentials as a checklist. Most tools are strong on the harness and weak on everything else. The differentiation happens in how well they handle units of work, workflows, memory, skills, and oversight.
    
    See [references/evaluation-checklist.md](references/evaluation-checklist.md) for a detailed evaluation template.