Navigation
This guide is split into 4 parts for better performance:
- Part 1: Chapters 1-5 - Prompt Chaining, Routing, Parallelization, Reflection, Tool Use
- Part 2: Chapters 6-10 - Planning, Multi-Agent Collaboration, Memory Management, Learning and Adaptation, Goal Setting and Monitoring
- Part 3: Chapters 11-15 - Exception Handling and Recovery, Human in the Loop, Knowledge Retrieval (RAG), Inter-Agent Communication, Resource-Aware Optimization
- Part 4: Chapters 16-20 - Reasoning Techniques, Evaluation and Monitoring, Guardrails and Safety Patterns, Prioritization, Exploration and Discovery
Introduction
Originally video talking about agentic systems.
You can help out the author who broke down the 400-page manual published by the Google engineer here.
Chapter 11: Exception Handling and Recovery
TLDDR: This is just the way that you catch errors in your agentic workflows. This is an agentic pattern to help catch issues in your other agentic patterns. So essentially what you're trying to do is you do something, you add safety checks, you make the call to these services or tools or both. Then you assess whether or not it worked. If it didn't work, you take that error, you catch it, and then you have to assess and classify what kind of error is it.
When to Use
- Production environments: Any system requiring high reliability
- External dependencies: When relying on APIs or services
- Critical operations: Tasks that must not fail completely
- Unpredictable inputs: Handling edge cases and anomalies
- Network operations: Managing connectivity issues
- Resource constraints: Dealing with limits and quotas
Where It Fits
- API integrations: Handling service outages and rate limits
- Data pipelines: Managing corrupt data and processing failures
- User-facing systems: Maintaining service availability
- Financial transactions: Ensuring transaction integrity
- IoT systems: Handling device failures and connectivity issues
How It Works
graph TD
Start[Try to Do Something] --> Wrap[Add Safety Checks]
Wrap --> Call[Make the Call]
Call --> External[Call External Service]
External --> Tool[Use a Tool]
External --> Service[Use a Service]
Tool --> Result{Did It Work?}
Service --> Result
Result -->|Success| Process[Use the Result]
Result -->|Error| Catch[Catch the Error]
Catch --> WhatKind{What Kind of Error?}
WhatKind -->|Temporary| Retry[Try Again]
WhatKind -->|Permanent| Backup[Use Backup Plan]
WhatKind -->|Critical| Emergency[Emergency Response]
Retry --> Wait[Wait a Bit]
Wait --> AddTime[Wait Longer Each Time]
AddTime --> Count{How Many Tries?}
Count -->|Less Than Max| Call
Count -->|Too Many| Backup
Backup --> Options{Backup Options}
Options --> Simple[Use Simpler Method]
Options --> Saved[Use Saved Data]
Options --> Default[Use Default Answer]
Options --> Human[Get Human Help]
Simple --> Recover[Start Recovery]
Saved --> Recover
Default --> Recover
Human --> Recover
Emergency --> SaveWork[Save Current Work]
SaveWork --> Alert[Alert the Team]
Alert --> Safety{Is It Safe to Continue?}
Safety -->|Over Limit| Stop[Emergency Stop]
Safety -->|OK| Resume[Pick Up Where We Left Off]
Recover --> Record[Record What Happened]
Resume --> Record
Stop --> Record
Record --> Track[Track Error Patterns]
Track --> Learn[Learn From Errors]
Learn --> Improve[Improve for Next Time]
Process --> Success[Task Completed]
Improve --> End[Continue Working]
Success --> End
style Start fill:#6366f1
style WhatKind fill:#3E92CC
style Options fill:#3E92CC
style Safety fill:#D8315B
style End fill:#10b981
style Emergency fill:#D8315BPros
- Reliability: System continues operating despite failures
- Graceful degradation: Provides partial functionality when full service unavailable
- Self-healing: Automatic recovery from transient issues
- User experience: Minimizes disruption to users
- Debugging support: Comprehensive error logging
- Learning capability: Improves handling over time
- State preservation: Can resume after interruptions
Cons
- Complexity increase: Error handling adds code complexity
- Performance overhead: Try/catch and retries add latency
- False positives: May retry when unnecessary
- Resource consumption: Retries and fallbacks use resources
- Cascading failures: Poor handling can worsen problems
- Testing difficulty: Hard to test all failure scenarios
- Maintenance burden: Error handling code needs updates
Real-World Examples
Payment Processing System
- Retry failed transactions with backoff
- Fallback to alternative payment gateways
- Save transaction state for manual review
- Notify finance team of critical failures
- Automatic refund on persistent failures
Data Integration Pipeline
- Handle malformed data gracefully
- Retry failed API calls with jitter
- Use cached data when services unavailable
- Checkpoint progress for resume capability
- Alert on data quality issues
Chatbot Customer Service
- Fallback to simpler responses on errors
- Escalate to human agents when stuck
- Save conversation state for handoff
- Retry knowledge base queries
- Default to FAQ responses
Content Delivery Network
- Retry failed origin fetches
- Serve stale content when origin down
- Route to backup servers
- Implement circuit breakers
- Geographic failover strategies
Machine Learning Pipeline
- Handle model loading failures
- Fallback to simpler models
- Retry failed predictions
- Cache frequent predictions
- Graceful degradation of features
IoT Device Management
- Retry failed device commands
- Queue commands for offline devices
- Use last known state as fallback
- Implement watchdog timers
- Automatic device reboot protocols
Chapter 12: Human in the Loop
TLDDR: Adding a human in the loop where there's low to high risk depending on the situation or most importantly edge cases. So you have some form of agent processing, you have a decision point and one of those decisions could be that a review is needed or you need to actually step in and intervene. A good actual tactical example is imagine you're using some form of agentic browser or agent mode in ChatGPT. At some point it will realize it needs you to step in to add your credentials to log in to your email to Upwork to whatever service it is.
When to Use
- High-stakes decisions: When errors have significant consequences
- Regulatory compliance: Required human oversight for legal reasons
- Quality assurance: Ensuring output meets standards
- Edge cases: Handling unusual or ambiguous situations
- Training data generation: Using human feedback to improve
- Trust building: Gradual automation with human validation
Where It Fits
- Content moderation: Reviewing sensitive or borderline content
- Medical diagnosis: Physician verification of AI recommendations
- Financial approvals: Human authorization for large transactions
- Legal document review: Attorney oversight of contracts
- Hiring decisions: Human review of AI-screened candidates
How It Works
graph TD
Start[Agent Processing] --> Identify[Identify Decision Points]
Identify --> Gates{Decision Gates}
Gates --> Approve[Approval Required]
Gates --> Review[Review Needed]
Gates --> Edit[Editing Checkpoint]
Gates --> Complex[Complex Case]
Approve --> Queue[Add to Review Queue]
Review --> Queue
Edit --> Queue
Complex --> Queue
Queue --> Batch[Batch Similar Items]
Batch --> Priority[Prioritize by Urgency]
Priority --> UI[Present in UI]
UI --> Context[Show Full Context]
Context --> Diff[Display Differences]
Diff --> SLA[Show SLA Timer]
SLA --> Human{Human Decision}
Human -->|Approve| Accept[Accept Agent Output]
Human -->|Deny| Reject[Reject with Reason]
Human -->|Edit| Modify[Human Edits Content]
Human -->|Takeover| Manual[Full Manual Control]
Accept --> Continue[Continue Workflow]
Reject --> Learn1[Capture Rejection Pattern]
Modify --> Learn2[Record Edit Changes]
Manual --> Learn3[Log Takeover Reason]
Learn1 --> Update[Update Agent Training]
Learn2 --> Update
Learn3 --> Update
Update --> Improve[Improve Future Decisions]
Continue --> Track[Track Decision Metrics]
Improve --> Track
Track --> Fatigue{Monitor Fatigue}
Fatigue -->|High| Reduce[Reduce Human Load]
Fatigue -->|Normal| Maintain[Maintain Current Flow]
Reduce --> Automate[Increase Automation]
Maintain --> Report[Generate Reports]
Automate --> Report
Report --> End[Process Complete]
style Start fill:#6366f1
style Gates fill:#3E92CC
style Human fill:#a855f7
style Fatigue fill:#3E92CC
style End fill:#10b981
style Reject fill:#D8315BPros
- Quality assurance: Human judgment catches AI errors
- Compliance: Meets regulatory requirements
- Learning source: Human feedback improves system
- Trust: Users confident in human oversight
- Flexibility: Humans handle edge cases well
- Accountability: Clear responsibility chain
- Risk mitigation: Prevents costly mistakes
Cons
- Scalability limits: Human bandwidth constrains throughput
- Cost increase: Human reviewers are expensive
- Latency addition: Waiting for human response delays process
- Inconsistency: Different humans make different decisions
- Fatigue effects: Quality degrades with reviewer tiredness
- Training requirements: Reviewers need domain expertise
- Availability issues: 24/7 coverage is challenging
Real-World Examples
Content Moderation Platform
- AI flags potentially problematic content
- Human reviewers make final decisions
- Complex cases escalated to senior moderators
- Reviewer feedback trains AI models
- Fatigue monitoring and rotation schedules
Loan Approval System
- AI assesses credit risk
- Human reviews borderline applications
- Large loans require manual approval
- Explanations provided for denials
- Audit trail for compliance
Medical Imaging Analysis
- AI detects potential abnormalities
- Radiologist confirms diagnoses
- Critical findings prioritized for review
- Second opinions for complex cases
- Continuous learning from corrections
Resume Screening
- AI filters initial applications
- HR reviews shortlisted candidates
- Diversity checks by humans
- Feedback improves screening algorithms
- Final interviews always human-led
Translation Quality Control
- AI performs initial translation
- Human linguists review and edit
- Cultural sensitivity checks
- Technical terminology verification
- Style consistency enforcement
Autonomous Vehicle Monitoring
- AI handles normal driving
- Remote operators handle edge cases
- Safety driver takeover capability
- Incident review and analysis
- Continuous improvement from interventions
Chapter 13: Knowledge Retrieval (RAG)
TLDDR: Indexing documents by parsing, chunking, and creating searchable embeddings. Literally RAG. So it's like having a librarian and you want to categorize or index a series of information and systems. So this one is pretty straightforward where you have a user query. You have some sources that you've ingested. You've parsed those documents, categorized them, embedded them, which again means in plain English, you take words, you turn them into vectors, you store vectors into library. So when you ask a question, you try to match the vector of the question to the vectors in your library with the closest match.
When to Use
- Dynamic knowledge needs: Accessing up-to-date information
- Large document collections: Querying extensive knowledge bases
- Domain-specific applications: Specialized knowledge integration
- Factual accuracy requirements: Grounding responses in sources
- Citation requirements: Providing verifiable references
- Reducing hallucinations: Ensuring factual responses
Where It Fits
- Enterprise search: Internal document retrieval systems
- Customer support: Knowledge base querying
- Research assistants: Academic paper retrieval
- Legal research: Case law and statute searching
- Technical documentation: API and product documentation access
How It Works
graph TD
Start[Documents to Search] --> Read[Read Documents]
Read --> Parse[Extract the Text]
Parse --> GetInfo[Get Document Info]
GetInfo --> AddTags[Add Tags and Labels]
AddTags --> Split{How to Split Text?}
Split --> Fixed[Equal Size Chunks]
Split --> Smart[Natural Breaks]
Split --> Context[Keep Related Parts Together]
Fixed --> Process[Process Each Chunk]
Smart --> Process
Context --> Process
Process --> Convert[Convert to Searchable Format]
Convert --> Store[Store in Search Database]
Store --> Ready[System Ready to Search]
Ready --> Question[User Asks Question]
Question --> Improve[Make Question Better]
Improve --> Expand[Add Related Words]
Expand --> Search[Search Database]
Search --> Find[Find Matching Chunks]
Find --> Filter[Remove Irrelevant Ones]
Filter --> Rank{Rank by Relevance}
Rank --> Score[Give Each a Score]
Score --> Sort[Sort Best to Worst]
Sort --> Pick[Pick Top Matches]
Pick --> Verify[Check Sources are Good]
Verify --> Use[Use Sources for Answer]
Use --> Generate[Create Answer]
Generate --> Cite[Add Source References]
Cite --> Quality{Is Answer Good?}
Quality -->|Yes| Deliver[Give Answer to User]
Quality -->|No| Redo[Try Different Search]
Redo --> Adjust[Change Search Settings]
Adjust --> Search
Deliver --> Track[Track How Well It Worked]
Track --> Measure[Measure Success]
Measure --> Accuracy[How Accurate?]
Measure --> Coverage[How Complete?]
Accuracy --> Improve_System[Make System Better]
Coverage --> Improve_System
Improve_System --> End[Search Complete]
style Start fill:#6366f1
style Split fill:#3E92CC
style Rank fill:#3E92CC
style Quality fill:#a855f7
style End fill:#10b981
style Redo fill:#D8315BPros
- Accuracy: Responses grounded in real sources
- Verifiability: Citations enable fact-checking
- Scalability: Handle vast document collections
- Currency: Access to latest information
- Domain expertise: Specialized knowledge integration
- Reduced hallucination: Less fabrication of facts
- Flexibility: Easy to update knowledge base
Cons
- Infrastructure needs: Requires vector databases and storage
- Processing overhead: Embedding and indexing costs
- Retrieval quality: Dependent on chunking and matching
- Context limitations: Retrieved chunks may lack context
- Latency: Additional retrieval step adds delay
- Maintenance: Knowledge base needs regular updates
- Relevance challenges: May retrieve irrelevant information
Real-World Examples
Enterprise Knowledge Management
- Index company policies and procedures
- Retrieve relevant HR guidelines
- Search technical documentation
- Access historical project data
- Provide sourced answers to employees
Legal Research Platform
- Index case law and statutes
- Retrieve relevant precedents
- Search legal commentary
- Find similar cases
- Generate briefs with citations
Medical Information System
- Index medical literature
- Retrieve treatment guidelines
- Search drug interactions
- Access clinical trials data
- Provide evidence-based recommendations
Academic Research Assistant
- Index research papers
- Retrieve relevant studies
- Search across disciplines
- Find citation networks
- Generate literature reviews
Technical Support System
- Index product documentation
- Retrieve troubleshooting guides
- Search error code databases
- Access configuration examples
- Provide solution steps with references
News Aggregation Service
- Index news articles in real-time
- Retrieve relevant coverage
- Search historical archives
- Find related stories
- Generate summaries with sources
Chapter 14: Inter-Agent Communication
TLDDR: Agents communicate through a structured messaging system with defined protocols. Message including IDs for tracking expiration times and security checks. So this is like an office email system with read receipts, security clearances and spam filters that prevent reply all disasters. So this is where you have language models talking to other language models. From a system perspective, this is where you could have multiple AI agents speak to one another and then you have to decide how they should communicate. Either they have one boss, one that manages all the other agents, which is sometimes really helpful to have because you have a single vector of failure that everything can report to. The next one is that everyone is equal, meaning everyone has a say at the table. It is a pure agentic democracy which sounds great but in practice really hard to dial in because you're always dealing with the risk of hallucination and misfiring.
When to Use
- Complex workflows: Tasks requiring multiple specialized agents
- Modular systems: Building composable agent architectures
- Distributed processing: Agents running in different locations
- Scalable architectures: Systems that need to grow
- Collaborative tasks: Agents working together on problems
- Service-oriented design: Agents as microservices
Where It Fits
- Enterprise automation: Coordinating business process agents
- Research systems: Agents collaborating on analysis
- Content production: Pipeline of content creation agents
- Trading systems: Agents coordinating financial decisions
- Smart city systems: IoT and service agents communicating
How It Works
graph TD
Start[Multiple AI Agents Need to Talk] --> Choose{How Should They Communicate?}
Choose -->|One Boss| Manager[One Agent Manages Others]
Choose -->|Everyone Equal| Direct[Agents Talk Directly]
Choose -->|Post Office| Mailbox[Central Message System]
Manager --> Setup[Set Up Communication Rules]
Direct --> Setup
Mailbox --> Setup
Setup --> Rules[Message Rules]
Rules --> Track[Tracking Number for Each Message]
Rules --> Expire[Messages Expire After Time Limit]
Rules --> Important[Mark Important Messages]
Track --> Check{Check Who Can Talk}
Expire --> Check
Important --> Check
Check --> Verify[Verify Agent Identity]
Verify --> Permission[Check What They Can Do]
Permission --> Allow[Allow Communication]
Allow --> Send[Send Message]
Send --> Deliver[Deliver to Right Agent]
Deliver --> Receive[Agent Gets Message]
Receive --> Process[Process Message]
Process --> Reply{Need to Reply?}
Reply -->|Yes| Answer[Send Answer Back]
Reply -->|No| Log[Record Message Received]
Answer --> Watch[Monitor Conversation]
Log --> Watch
Watch --> Problems{Any Problems?}
Problems -->|Endless Loop| Stop[Stop the Loop]
Problems -->|Stuck| Fix[Unstick the Agents]
Problems -->|Too Long| Timeout[Cancel Old Messages]
Problems -->|All Good| Continue[Keep Going]
Stop --> Alert[Alert Human]
Fix --> Alert
Timeout --> Alert
Continue --> Record[Save Conversation History]
Alert --> Recover[Fix the Problem]
Record --> Report[Create Activity Report]
Recover --> End[Communication Complete]
Report --> End
style Start fill:#6366f1
style Choose fill:#3E92CC
style Check fill:#3E92CC
style Problems fill:#a855f7
style End fill:#10b981
style Stop fill:#D8315B
style Fix fill:#D8315BPros
- Modularity: Clear separation of agent responsibilities
- Scalability: Easy to add new agents to the system
- Flexibility: Different communication patterns available
- Fault isolation: Agent failures don't crash system
- Reusability: Agents can be reused in different workflows
- Debugging support: Message tracing aids troubleshooting
- Parallel processing: Agents can work simultaneously
Cons
- Complexity overhead: Communication protocols add complexity
- Latency accumulation: Message passing adds delays
- Coordination challenges: Managing agent interactions
- Debugging difficulty: Tracing distributed conversations
- State management: Maintaining consistency across agents
- Network dependencies: Vulnerable to communication failures
- Security concerns: Inter-agent authentication needed
Real-World Examples
E-commerce Order Processing
- Inventory Agent checks stock availability
- Pricing Agent calculates total costs
- Payment Agent processes transactions
- Shipping Agent arranges delivery
- Notification Agent updates customer
- Orchestrator coordinates entire flow
News Production Pipeline
- Crawler Agent gathers news sources
- Fact-Check Agent verifies information
- Writer Agent creates articles
- Editor Agent reviews content
- Publisher Agent posts to CMS
- Analytics Agent tracks performance
Financial Analysis Platform
- Data Agent collects market information
- Technical Agent performs chart analysis
- Fundamental Agent analyzes financials
- Risk Agent assesses portfolio exposure
- Report Agent generates recommendations
- Compliance Agent ensures regulations
Smart Manufacturing System
- Sensor Agents monitor equipment
- Quality Agents check production
- Maintenance Agents schedule repairs
- Inventory Agents manage supplies
- Planning Agents optimize schedules
- Control Agent coordinates operations
Healthcare Coordination
- Triage Agent assesses symptoms
- Diagnostic Agent suggests tests
- Specialist Agents provide expertise
- Treatment Agent recommends therapy
- Pharmacy Agent manages medications
- Scheduler Agent books appointments
Research Collaboration Platform
- Literature Agent searches papers
- Data Agent manages datasets
- Analysis Agent runs experiments
- Visualization Agent creates charts
- Writing Agent drafts reports
- Review Agent checks quality
Chapter 15: Resource-Aware Optimization
TLDDR: Analyzing a task complexity and then routing to appropriate resources. So simple tasks use cheap, fast models, but complex tasks use powerful but expensive models. Think of something like GPT-5 where there was a huge uproar because we lost all of our models. Then we got either quick thinking, kind of thinking, hard thinking or like professional thinking. Each one of those would route your request in ChatGPT to the model that it thought would be the best suited for that particular outcome. So the analogy here is a playful one where it's like choosing between walking, a bus or a taxi depending on the distance, the urgency or the budget.
When to Use
- Cost-sensitive operations: When managing API or compute costs
- High-volume processing: Optimizing large-scale operations
- Variable workloads: Different tasks need different resources
- Budget constraints: Operating within financial limits
- Performance requirements: Balancing speed vs cost
- Multi-tenant systems: Fair resource allocation across users
Where It Fits
- SaaS platforms: Managing per-customer resource usage
- Batch processing: Optimizing large data processing jobs
- Real-time systems: Balancing latency and cost
- Development environments: Using cheaper models for testing
- Production systems: Optimizing operational costs
How It Works
graph TD
Start[Task Request] --> Analyze[Analyze Complexity]
Analyze --> Budget{Set Budgets}
Budget --> Token[Token Limits]
Budget --> Time[Time Constraints]
Budget --> Cost[Money Budget]
Token --> Router[Router Agent]
Time --> Router
Cost --> Router
Router --> Classify{Classify Complexity}
Classify -->|Simple| Cheap[Use Small Model]
Classify -->|Medium| Standard[Use Standard Model]
Classify -->|Complex| Premium[Use Advanced Model]
Classify -->|Unknown| Test[Run Quick Test]
Test --> Confidence{Check Confidence}
Confidence -->|Low| Escalate[Escalate to Better Model]
Confidence -->|High| Proceed[Continue with Current]
Cheap --> Execute[Execute Task]
Standard --> Execute
Premium --> Execute
Escalate --> Execute
Proceed --> Execute
Execute --> Monitor[Monitor Resources]
Monitor --> Track{Track Usage}
Track --> Tokens[Token Count]
Track --> Latency[Response Time]
Track --> Costs[API Costs]
Tokens --> Check{Within Limits?}
Latency --> Check
Costs --> Check
Check -->|Yes| Continue[Continue Processing]
Check -->|No| Optimize[Optimization Needed]
Optimize --> Prune[Prune Context]
Optimize --> Cache[Use Cached Results]
Optimize --> Downgrade[Switch to Cheaper Model]
Prune --> Retry[Retry Operation]
Cache --> Retry
Downgrade --> Retry
Continue --> Complete[Task Complete]
Retry --> Monitor
Complete --> Measure[Measure Quality/Cost]
Measure --> Delta[Calculate Delta]
Delta --> Tune[Tune Thresholds]
Tune --> Learn[Update Router Logic]
Learn --> Report[Generate Report]
Report --> End[Optimized Execution]
style Start fill:#6366f1
style Classify fill:#3E92CC
style Check fill:#3E92CC
style Delta fill:#a855f7
style End fill:#10b981
style Optimize fill:#D8315BPros
- Cost reduction: Significant savings on API and compute costs
- Performance optimization: Right-sized resources for each task
- Scalability: Efficient resource use enables growth
- Flexibility: Dynamic adjustment to workload changes
- Budget control: Predictable operational costs
- Quality preservation: Maintains output quality where needed
- Automatic optimization: Self-tuning based on patterns
Cons
- Complexity increase: Resource management adds overhead
- Quality variations: Different models produce different results
- Routing overhead: Classification step adds latency
- Monitoring requirements: Need comprehensive tracking
- Tuning challenges: Finding optimal thresholds takes time
- Cache management: Maintaining cache coherency
- User experience: Inconsistent response times
Real-World Examples
Customer Support Platform
- Simple FAQs use lightweight models
- Complex issues use advanced models
- Cache common question responses
- Prioritize premium customers
- Track cost per ticket resolution
Content Generation Service
- Short social posts use fast models
- Long articles use quality models
- Reuse templates for common requests
- Batch similar requests together
- Monitor cost per content piece
Code Assistant Tool
- Syntax fixes use simple models
- Architecture design uses advanced models
- Cache common code patterns
- Prioritize based on project importance
- Track cost per developer action
Translation Platform
- Common languages use basic models
- Rare languages use specialized models
- Cache frequent translations
- Batch document processing
- Optimize cost per word translated
Data Analysis System
- Simple aggregations use basic compute
- Complex ML uses premium resources
- Cache intermediate results
- Schedule heavy jobs off-peak
- Monitor cost per analysis
Educational Platform
- Basic Q&A uses lightweight models
- Complex tutoring uses advanced models
- Cache common explanations
- Allocate resources by subscription tier
- Track cost per student interaction