20 Agentic Design Patterns (Part 1)

This guide is split into 4 parts for better performance:

Part 1: Chapters 1-5 - Prompt Chaining, Routing, Parallelization, Reflection, Tool Use
Part 2: Chapters 6-10 - Planning, Multi-Agent Collaboration, Memory Management, Learning and Adaptation, Goal Setting and Monitoring
Part 3: Chapters 11-15 - Exception Handling and Recovery, Human in the Loop, Knowledge Retrieval (RAG), Inter-Agent Communication, Resource-Aware Optimization
Part 4: Chapters 16-20 - Reasoning Techniques, Evaluation and Monitoring, Guardrails and Safety Patterns, Prioritization, Exploration and Discovery

Introduction

Originally video talking about agentic systems.

You can help out the author who broke down the 400-page manual published by the Google engineer here.

TLDDR: Breaking big tasks into smaller sequential steps, where each step validates the output of the previous one before passing data to the next. Think of it like an assembly line where each station completes its part, checks quality, then hands it off to the next section.

When to Use

Complex multi-step processes: When you need to break down a complex task into discrete, manageable steps
Data transformation pipelines: When information needs to be processed through multiple stages with different requirements
Quality-critical workflows: When each step's output must meet specific criteria before proceeding
Debugging requirements: When you need clear visibility into each stage of processing
Mixed tool/AI operations: When combining LLM calls with API calls, database queries, or other tools

Where It Fits

Document processing pipelines: Research → Analysis → Writing → Editing → Publishing
Data ETL workflows: Extract → Transform → Validate → Load
Customer service flows: Intent recognition → Information gathering → Solution generation → Response formatting
Code generation: Requirements → Design → Implementation → Testing → Documentation
Content creation: Ideation → Outline → Draft → Review → Finalization

How It Works

graph TD
    Start[User Request] --> Break[Break Down Into Small Steps]
    Break --> Define[Set Rules for Each Step]
    Define --> Task1[Do First Task]
    Task1 --> Check1{Is Output Good?}
    Check1 -->|Yes| Task2[Do Second Task]
    Check1 -->|No| Retry1[Try Again]
    Retry1 --> Task1
    Task2 --> Check2{Is Output Good?}
    Check2 -->|Yes| Task3[Do Third Task with Tools]
    Check2 -->|No| Retry2[Try Again]
    Retry2 --> Task2
    Task3 --> Check3{Is Output Good?}
    Check3 -->|Yes| Combine[Combine All Results]
    Check3 -->|No| Retry3[Try Again]
    Retry3 --> Task3
    Combine --> Build[Build Final Answer]
    Build --> Save[Save Work and Notes]
    Save --> End[Deliver Complete Result]

    style Start fill:#6366f1
    style End fill:#10b981
    style Check1 fill:#3E92CC
    style Check2 fill:#3E92CC
    style Check3 fill:#3E92CC
    style Task3 fill:#a855f7
    style Retry1 fill:#D8315B
    style Retry2 fill:#D8315B
    style Retry3 fill:#D8315B

How does prompt chaining work?

Break big task into subtasks, create data contracts between them

Data contracts?

Rules for what each step outputs and what the next step expects

Then what?

Execute task 1, validate output. If good, pass to task 2. If bad, retry until it passes

How many steps?

Usually 3-5. Too many and you get diminishing returns or hallucinations

What if something fails?

Log everything. You can trace back through the chain to see exactly where it broke

Pros

Modularity: Each step can be developed, tested, and optimized independently
Debuggability: Clear visibility into where failures occur in the chain
Reliability: Structured data contracts ensure consistent handoffs between steps
Reusability: Individual chain components can be reused in different workflows
Error handling: Each step can have specific retry logic and fallback strategies
Incremental progress: Partial results can be saved and resumed if interrupted
Parallel development: Different team members can work on different chain segments

Cons

Latency accumulation: Each step adds processing time, leading to longer total execution
Context limitations: Information might be lost or compressed between steps
Error propagation: Mistakes early in the chain can cascade through subsequent steps
Complexity overhead: Simple tasks might become over-engineered with unnecessary steps
Cost multiplication: Each LLM call incurs costs, which accumulate across the chain
Rigid structure: May be inflexible for tasks requiring dynamic adaptation
State management: Requires careful handling of intermediate results and context

Real-World Examples

Legal Document Analysis

Step 1: Extract key clauses from contracts
Step 2: Identify potential risks and obligations
Step 3: Compare against standard templates
Step 4: Generate executive summary with recommendations

E-commerce Product Descriptions

Step 1: Extract product features from manufacturer data
Step 2: Research competitor descriptions and pricing
Step 3: Generate SEO-optimized description
Step 4: Create variations for different platforms
Step 5: Validate against brand guidelines

Academic Research Assistant

Step 1: Parse research question and identify key concepts
Step 2: Search and retrieve relevant papers
Step 3: Extract and summarize findings
Step 4: Identify gaps and contradictions
Step 5: Generate literature review with citations

Software Bug Analysis

Step 1: Parse error logs and stack traces
Step 2: Identify affected components
Step 3: Search for similar issues in knowledge base
Step 4: Generate potential solutions
Step 5: Create detailed bug report with reproduction steps

Financial Report Generation

Step 1: Collect data from multiple sources
Step 2: Perform calculations and analysis
Step 3: Identify trends and anomalies
Step 4: Generate narrative explanations
Step 5: Format into regulatory-compliant report

Chapter 2: Routing

TLDDR: Analyzing incoming requests and sending them to the right specialist agent based on what they need. Think of it like a smart receptionist or operator who listens to what you need and directs you to the right person or department—tech support, accounting, etc. The key thing is if the operator is unsure, they should go back and ask clarifying questions to better understand where to route that request.

When to Use

Multi-domain systems: When handling diverse request types requiring different expertise
Dynamic workflow selection: When the appropriate process depends on input characteristics
Resource optimization: When different requests require different computational resources
Specialized tool access: When specific tools or APIs are needed based on request type
Confidence-based processing: When you need to handle ambiguous requests differently
Load balancing: When distributing work across multiple specialized agents

Where It Fits

Customer service platforms: Routing inquiries to appropriate department agents
Multi-modal AI systems: Directing requests to text, image, or code processing pipelines
Enterprise automation: Routing tasks to appropriate business process workflows
Content moderation: Directing content to appropriate review pipelines
Healthcare triage: Routing patient queries to appropriate medical specialists

How It Works

graph TD
    Start[Customer Request] --> Look[Look at What They Need]
    Look --> Decide{Which Expert Should Handle This?}
    
    Decide -->|Technical Problem| Tech[Send to Tech Support]
    Decide -->|Want to Buy| Sales[Send to Sales Team]
    Decide -->|Account Question| Account[Send to Account Help]
    Decide -->|General Question| General[Send to General Assistant]
    Decide -->|Not Sure| Ask[Ask for More Details]
    
    Tech --> TechTools[Give Access to Tech Guides]
    Sales --> SalesTools[Give Access to Product Info]
    Account --> AccountTools[Give Access to User Account]
    General --> GeneralTools[Give Access to FAQ]
    
    TechTools --> Work1[Work on Tech Problem]
    SalesTools --> Work2[Work on Sales Question]
    AccountTools --> Work3[Work on Account Issue]
    GeneralTools --> Work4[Work on General Question]
    Ask --> Understand[Better Understand Request]
    
    Work1 --> Check{Is Answer Good?}
    Work2 --> Check
    Work3 --> Check
    Work4 --> Check
    Understand --> Decide
    
    Check -->|Yes| Answer[Send Answer to Customer]
    Check -->|No| Backup[Get Human Help]
    
    Answer --> Record[Record What Happened]
    Backup --> Record
    Record --> End[Complete]

    style Start fill:#6366f1
    style Decide fill:#3E92CC
    style Ask fill:#D8315B
    style Check fill:#3E92CC
    style End fill:#10b981

How does routing work?

Analyze the request, decide which specialist agent should handle it

What if it's not sure?

Ask clarifying questions until confidence is high enough

How do you measure confidence?

LLM can give a score out of 10, but that's unreliable. Better to use deterministic stats

Then what?

Route to the right agent, get response, check if it succeeded or failed, deliver result

Pros

Specialization: Each route can be optimized for specific task types
Scalability: Easy to add new routes without affecting existing ones
Efficiency: Requests are handled by the most appropriate resources
Flexibility: Dynamic routing based on context and confidence
Clarity: Clear separation of concerns between different workflows
Performance: Avoid unnecessary processing for simple requests
Maintainability: Each route can be updated independently

Cons

Router complexity: The routing logic itself can become a bottleneck
Misrouting risks: Incorrect routing decisions can lead to poor outcomes
Latency overhead: Additional step for routing decision adds delay
Training requirements: Router needs continuous improvement based on feedback
Edge cases: Ambiguous requests may not fit cleanly into categories
Coordination overhead: Managing multiple specialized agents increases complexity
Monitoring complexity: Need to track performance across multiple paths

Real-World Examples

AI Customer Service Hub

Technical issues → Technical Support Agent with access to documentation
Billing questions → Finance Agent with access to payment systems
Product inquiries → Sales Agent with catalog access
Complaints → Escalation Agent with CRM integration
General questions → FAQ Agent with knowledge base

Content Creation Platform

Blog posts → Long-form Writing Agent
Social media → Short-form Content Agent
Technical documentation → Technical Writing Agent
Marketing copy → Copywriting Agent
Translations → Localization Agent

Code Assistant Router

Bug fixes → Debugging Agent with error analysis tools
New features → Development Agent with design patterns
Refactoring → Code Quality Agent with best practices
Testing → Test Generation Agent with coverage tools
Documentation → Documentation Agent with template library

Financial Services Router

Trading requests → Trading Agent with market data
Risk assessment → Risk Analysis Agent with models
Compliance checks → Compliance Agent with regulations
Reporting → Report Generation Agent with templates
Fraud detection → Security Agent with pattern detection

Educational Platform Router

Math problems → Mathematical Reasoning Agent
Language learning → Language Tutor Agent
Science questions → Science Expert Agent
History queries → Historical Research Agent
Study planning → Learning Strategy Agent

Chapter 3: Parallelization

TLDDR: Splitting a large job into independent chunks that can be processed at the same time by multiple workers. When we say workers here, that is a proxy for agents. Think of it like having 10 people each read different chapters of a book simultaneously, then combining all the summaries at the end.

When to Use

Large-scale data processing: When processing multiple documents, records, or data sources
Time-sensitive operations: When results are needed quickly and tasks are independent
Batch operations: When performing the same operation on multiple items
Web scraping/crawling: When gathering data from multiple sources simultaneously
Multi-document analysis: When analyzing multiple files or documents independently
API aggregation: When calling multiple APIs that don't depend on each other

Where It Fits

Document processing pipelines: Analyzing multiple PDFs or reports simultaneously
Data enrichment workflows: Enhancing records from multiple data sources
Content generation: Creating multiple variations or translations in parallel
Research automation: Searching multiple databases or sources concurrently
Testing frameworks: Running multiple test scenarios simultaneously

How It Works

graph TD
    Start[Big Job to Do] --> Look[Look at the Work]
    Look --> Split[Split Into Smaller Pieces]
    
    Split --> Check{Do We Have Resources?}
    Check -->|Yes| Start_Workers[Start Multiple Workers]
    Check -->|Limited| Queue[Take Turns with Limited Workers]
    
    Start_Workers --> W1[Worker 1: Do Piece A]
    Start_Workers --> W2[Worker 2: Do Piece B]
    Start_Workers --> W3[Worker 3: Do Piece C]
    Start_Workers --> W4[Worker 4: Do Piece D]
    Queue --> Batch[Work in Small Groups]
    
    W1 --> Try1{Did It Work?}
    W2 --> Try2{Did It Work?}
    W3 --> Try3{Did It Work?}
    W4 --> Try4{Did It Work?}
    Batch --> Try5{Did It Work?}
    
    Try1 -->|Yes| Collect[Collect All Results]
    Try1 -->|No| Wait1[Wait and Try Again]
    Try2 -->|Yes| Collect
    Try2 -->|No| Wait2[Wait and Try Again]
    Try3 -->|Yes| Collect
    Try3 -->|No| Wait3[Wait and Try Again]
    Try4 -->|Yes| Collect
    Try4 -->|No| Wait4[Wait and Try Again]
    Try5 -->|Yes| Collect
    Try5 -->|No| Wait5[Wait and Try Again]
    
    Wait1 --> W1
    Wait2 --> W2
    Wait3 --> W3
    Wait4 --> W4
    Wait5 --> Batch
    
    Collect --> Organize[Organize Results]
    Organize --> Combine[Combine Everything]
    Combine --> Final[Create Final Result]
    Final --> Summary[Create Summary Report]
    Summary --> End[Job Complete]

    style Start fill:#6366f1
    style Start_Workers fill:#3E92CC
    style Check fill:#3E92CC
    style Try1 fill:#3E92CC
    style Try2 fill:#3E92CC
    style Try3 fill:#3E92CC
    style Try4 fill:#3E92CC
    style Try5 fill:#3E92CC
    style Wait1 fill:#D8315B
    style Wait2 fill:#D8315B
    style Wait3 fill:#D8315B
    style Wait4 fill:#D8315B
    style Wait5 fill:#D8315B
    style Collect fill:#a855f7
    style End fill:#10b981

How does parallelization work?

Take big task, split into independent chunks that can run at the same time

Example?

Like reducing customer churn by 20%. Split into: survey customers, analyze exit interviews, check platform issues

Then what?

Check resources, spawn parallel workers. Each worker retries until it succeeds

How do you combine results?

Normalize everything into same format (turn apples/oranges into all apples), merge, create summary

Why track which worker did what?

So if something fails, you know which worker to fix or adjust

Pros

Speed improvement: Dramatic reduction in total processing time
Resource utilization: Better use of available computational resources
Scalability: Easy to scale up or down based on workload
Fault isolation: Failure in one worker doesn't affect others
Progress tracking: Can show incremental progress as workers complete
Flexibility: Can dynamically adjust worker count based on load
Cost efficiency: Optimize resource usage and reduce idle time

Cons

Complexity increase: Managing multiple concurrent processes is challenging
Resource limits: API rate limits and quotas constrain parallelization
Coordination overhead: Synchronization and result merging add complexity
Debugging difficulty: Harder to trace issues in parallel execution
Cost multiplication: Multiple simultaneous API calls increase costs
Memory usage: Holding multiple results in memory can be resource-intensive
Ordering challenges: Maintaining sequence when needed requires extra logic

Real-World Examples

News Aggregation Service

Simultaneously fetch articles from 50+ news sources
Each worker processes one news source
Rate limit to 10 concurrent API calls
Merge and deduplicate results
Sort by relevance and timestamp

E-commerce Price Monitoring

Monitor prices across 100+ competitor sites
Parallel workers scrape product pages
Handle retry logic for failed requests
Aggregate pricing data into comparison matrix
Generate price change alerts

Document Intelligence System

Process 1000+ page legal document set
Split into 50-page chunks for parallel analysis
Each worker extracts entities and clauses
Merge findings into comprehensive report
Track document provenance for each finding

Code Repository Analysis

Scan entire codebase for security vulnerabilities
Parallel workers analyze different directories
Each worker runs different security checks
Collect and prioritize all findings
Generate comprehensive security report

Multi-language Translation Project

Translate documentation into 15 languages
Parallel workers for each language pair
Maintain consistency with translation memory
Quality check each translation
Compile into multi-language documentation set

Chapter 4: Reflection

TLDDR: Generate a first draft, then have a critic review it against quality standards. Based on the feedback, revise and improve. Essentially repeat this until you meet your quality standards. Think of it like writing an essay, having a teacher review it for you, and then making improvements until you finally get a passing grade.

When to Use

Quality-critical outputs: When high accuracy and quality are non-negotiable
Complex reasoning tasks: When problems require iterative refinement
Creative work: When content needs multiple rounds of improvement
Learning systems: When you want to improve performance over time
Error-prone domains: When initial attempts often have mistakes
Compliance requirements: When outputs must meet specific standards

Where It Fits

Content creation: Blog posts, reports, and documentation requiring polish
Code generation: Producing bug-free, optimized code
Legal document drafting: Ensuring accuracy and completeness
Academic writing: Research papers needing fact-checking and citations
Product descriptions: E-commerce content requiring SEO and accuracy

How It Works

graph TD
    Start[Initial Request] --> Generate[Generate First Draft]
    Generate --> Output1[Initial Output]
    
    Output1 --> Critic{Critic Agent Review}
    
    Critic --> Rubric[Apply Quality Rubrics]
    Critic --> Tests[Run Unit Tests]
    Critic --> Check[Grammar & Logic Check]
    
    Rubric --> Score1{Quality Score}
    Tests --> Score2{Test Results}
    Check --> Score3{Check Results}
    
    Score1 --> Evaluate{Meets Criteria?}
    Score2 --> Evaluate
    Score3 --> Evaluate
    
    Evaluate -->|No| Feedback[Generate Structured Feedback]
    Evaluate -->|Yes| Accept[Accept Output]
    
    Feedback --> Revise[Revision Agent]
    Revise --> Address[Address Each Issue]
    Address --> Output2[Revised Output]
    
    Output2 --> Counter{Iteration Count}
    Counter -->|< Max| Critic
    Counter -->|>= Max| Converge[Use Best Version]
    
    Accept --> Record[Record Success Patterns]
    Converge --> Record
    
    Record --> Learn[Update Prompts/Rules]
    Learn --> Final[Final Output]
    Final --> End[Deliver Result]

    style Start fill:#6366f1
    style Critic fill:#3E92CC
    style Evaluate fill:#3E92CC
    style Accept fill:#10b981
    style Converge fill:#D8315B
    style End fill:#10b981

How does reflection work?

Generate first draft, then critic agent reviews it against quality standards

What standards?

Quality rubrics, unit tests for edge cases, grammar/logic checks

If it passes?

Accept output, record success patterns, update prompts if needed

If it fails?

Generate structured feedback, send back to original agent, revise and repeat

What if it keeps failing?

Set max iterations (like 3 tries). After that, use the best version you got

Pros

Quality improvement: Systematic enhancement through multiple iterations
Error reduction: Catches and fixes mistakes before final delivery
Objectivity: Separation of generation and critique roles
Learning capability: System improves over time from patterns
Transparency: Clear feedback trail for improvements
Flexibility: Can adjust critique criteria for different use cases
Consistency: Applies same quality standards uniformly

Cons

Increased latency: Multiple iterations multiply processing time
Higher costs: Each reflection cycle incurs additional API calls
Context window limits: Long documents may exceed token limits
Diminishing returns: Later iterations may provide minimal improvement
Over-optimization: Risk of making content generic or losing voice
Complexity: Requires careful tuning of critique criteria
API throttling: Multiple rapid calls may hit rate limits

Real-World Examples

Technical Blog Post Creation

Initial draft generation
Technical accuracy review
Code example validation
SEO optimization check
Readability improvements
Final grammar and style polish

Contract Generation System

Draft initial contract terms
Legal compliance review
Risk assessment critique
Clarity and ambiguity check
Client-specific customization
Final legal review

Educational Content Development

Create lesson content
Pedagogical effectiveness review
Factual accuracy verification
Age-appropriateness check
Engagement factor assessment
Accessibility improvements

Software Documentation

Generate API documentation
Technical accuracy review
Code example testing
Completeness check
Clarity improvements
Version consistency validation

Initial copy generation
Brand voice alignment check
Persuasiveness assessment
Fact and claim verification
SEO keyword optimization
A/B test variant creation

Research Report Writing

Draft research findings
Methodology critique
Statistical validation
Citation verification
Logical flow improvement
Executive summary refinement

Chapter 5: Tool Use

TLDDR: When the AI needs external information or actions, it discovers available tools, checks permissions, and then calls the right tool with proper parameters. Think of it like a chef who needs ingredients, checks what's available in the pantry, then verifies they can use it, and then retrieves and actually uses it in the recipe.

When to Use

External data access: When agents need real-time or dynamic information
System integration: When connecting to databases, APIs, or services
Computational tasks: When precise calculations or data processing is needed
File operations: When reading, writing, or manipulating files
Action execution: When agents need to perform concrete actions
Multi-step workflows: When combining AI reasoning with tool execution

Where It Fits

Research assistants: Web search, document retrieval, fact-checking
Data analysis workflows: Database queries, calculations, visualizations
DevOps automation: System commands, deployment tools, monitoring
Customer service: CRM access, ticket management, knowledge base queries
Content management: File operations, publishing tools, asset management

How It Works

graph TD
    Start[User Request] --> Analyze[Analyze Task Requirements]
    Analyze --> Discover[Discover Available Tools]
    
    Discover --> Catalog[Tool Catalog]
    Catalog --> API1[Web Search API]
    Catalog --> API2[Database Query Tool]
    Catalog --> API3[Calculator Function]
    Catalog --> API4[File System Access]
    Catalog --> API5[External Service API]
    
    Catalog --> Select{Select Appropriate Tool}
    
    Select --> Match[Match Capabilities to Need]
    Match --> Safety{Safety Check}
    
    Safety -->|Pass| Prepare[Prepare Tool Call]
    Safety -->|Fail| Deny[Deny Access with Reason]
    
    Prepare --> Validate[Validate Input Parameters]
    Validate --> Call[Execute Tool with Arguments]
    
    Call --> Handle{Handle Response}
    Handle -->|Success| Parse[Parse Tool Output]
    Handle -->|Error| ErrorHandle[Error Recovery]
    Handle -->|Timeout| Retry[Retry with Backoff]
    
    ErrorHandle --> Fallback[Use Fallback Method]
    Retry --> Call
    
    Parse --> Normalize[Normalize for LLM]
    Fallback --> Normalize
    
    Normalize --> Process[Process with Context]
    Process --> Decision{Next Action?}
    
    Decision -->|More Tools| Select
    Decision -->|Complete| Audit[Audit Tool Usage]
    
    Deny --> Log[Log Security Event]
    Audit --> Redact[Redact Sensitive Data]
    Log --> Redact
    
    Redact --> Result[Generate Final Response]
    Result --> End[Return to User]

    style Start fill:#6366f1
    style Select fill:#3E92CC
    style Safety fill:#D8315B
    style Handle fill:#3E92CC
    style End fill:#10b981