Navigation
This guide is split into 4 parts for better performance:
- Part 1: Chapters 1-5 - Prompt Chaining, Routing, Parallelization, Reflection, Tool Use
- Part 2: Chapters 6-10 - Planning, Multi-Agent Collaboration, Memory Management, Learning and Adaptation, Goal Setting and Monitoring
- Part 3: Chapters 11-15 - Exception Handling and Recovery, Human in the Loop, Knowledge Retrieval (RAG), Inter-Agent Communication, Resource-Aware Optimization
- Part 4: Chapters 16-20 - Reasoning Techniques, Evaluation and Monitoring, Guardrails and Safety Patterns, Prioritization, Exploration and Discovery
Introduction
Originally video talking about agentic systems.
You can help out the author who broke down the 400-page manual published by the Google engineer here.
Chapter 1: Prompt Chaining
TLDDR: Breaking big tasks into smaller sequential steps, where each step validates the output of the previous one before passing data to the next. Think of it like an assembly line where each station completes its part, checks quality, then hands it off to the next section.
When to Use
- Complex multi-step processes: When you need to break down a complex task into discrete, manageable steps
- Data transformation pipelines: When information needs to be processed through multiple stages with different requirements
- Quality-critical workflows: When each step's output must meet specific criteria before proceeding
- Debugging requirements: When you need clear visibility into each stage of processing
- Mixed tool/AI operations: When combining LLM calls with API calls, database queries, or other tools
Where It Fits
- Document processing pipelines: Research → Analysis → Writing → Editing → Publishing
- Data ETL workflows: Extract → Transform → Validate → Load
- Customer service flows: Intent recognition → Information gathering → Solution generation → Response formatting
- Code generation: Requirements → Design → Implementation → Testing → Documentation
- Content creation: Ideation → Outline → Draft → Review → Finalization
How It Works
graph TD
Start[User Request] --> Break[Break Down Into Small Steps]
Break --> Define[Set Rules for Each Step]
Define --> Task1[Do First Task]
Task1 --> Check1{Is Output Good?}
Check1 -->|Yes| Task2[Do Second Task]
Check1 -->|No| Retry1[Try Again]
Retry1 --> Task1
Task2 --> Check2{Is Output Good?}
Check2 -->|Yes| Task3[Do Third Task with Tools]
Check2 -->|No| Retry2[Try Again]
Retry2 --> Task2
Task3 --> Check3{Is Output Good?}
Check3 -->|Yes| Combine[Combine All Results]
Check3 -->|No| Retry3[Try Again]
Retry3 --> Task3
Combine --> Build[Build Final Answer]
Build --> Save[Save Work and Notes]
Save --> End[Deliver Complete Result]
style Start fill:#6366f1
style End fill:#10b981
style Check1 fill:#3E92CC
style Check2 fill:#3E92CC
style Check3 fill:#3E92CC
style Task3 fill:#a855f7
style Retry1 fill:#D8315B
style Retry2 fill:#D8315B
style Retry3 fill:#D8315BPros
- Modularity: Each step can be developed, tested, and optimized independently
- Debuggability: Clear visibility into where failures occur in the chain
- Reliability: Structured data contracts ensure consistent handoffs between steps
- Reusability: Individual chain components can be reused in different workflows
- Error handling: Each step can have specific retry logic and fallback strategies
- Incremental progress: Partial results can be saved and resumed if interrupted
- Parallel development: Different team members can work on different chain segments
Cons
- Latency accumulation: Each step adds processing time, leading to longer total execution
- Context limitations: Information might be lost or compressed between steps
- Error propagation: Mistakes early in the chain can cascade through subsequent steps
- Complexity overhead: Simple tasks might become over-engineered with unnecessary steps
- Cost multiplication: Each LLM call incurs costs, which accumulate across the chain
- Rigid structure: May be inflexible for tasks requiring dynamic adaptation
- State management: Requires careful handling of intermediate results and context
Real-World Examples
Legal Document Analysis
- Step 1: Extract key clauses from contracts
- Step 2: Identify potential risks and obligations
- Step 3: Compare against standard templates
- Step 4: Generate executive summary with recommendations
E-commerce Product Descriptions
- Step 1: Extract product features from manufacturer data
- Step 2: Research competitor descriptions and pricing
- Step 3: Generate SEO-optimized description
- Step 4: Create variations for different platforms
- Step 5: Validate against brand guidelines
Academic Research Assistant
- Step 1: Parse research question and identify key concepts
- Step 2: Search and retrieve relevant papers
- Step 3: Extract and summarize findings
- Step 4: Identify gaps and contradictions
- Step 5: Generate literature review with citations
Software Bug Analysis
- Step 1: Parse error logs and stack traces
- Step 2: Identify affected components
- Step 3: Search for similar issues in knowledge base
- Step 4: Generate potential solutions
- Step 5: Create detailed bug report with reproduction steps
Financial Report Generation
- Step 1: Collect data from multiple sources
- Step 2: Perform calculations and analysis
- Step 3: Identify trends and anomalies
- Step 4: Generate narrative explanations
- Step 5: Format into regulatory-compliant report
Chapter 2: Routing
TLDDR: Analyzing incoming requests and sending them to the right specialist agent based on what they need. Think of it like a smart receptionist or operator who listens to what you need and directs you to the right person or department—tech support, accounting, etc. The key thing is if the operator is unsure, they should go back and ask clarifying questions to better understand where to route that request.
When to Use
- Multi-domain systems: When handling diverse request types requiring different expertise
- Dynamic workflow selection: When the appropriate process depends on input characteristics
- Resource optimization: When different requests require different computational resources
- Specialized tool access: When specific tools or APIs are needed based on request type
- Confidence-based processing: When you need to handle ambiguous requests differently
- Load balancing: When distributing work across multiple specialized agents
Where It Fits
- Customer service platforms: Routing inquiries to appropriate department agents
- Multi-modal AI systems: Directing requests to text, image, or code processing pipelines
- Enterprise automation: Routing tasks to appropriate business process workflows
- Content moderation: Directing content to appropriate review pipelines
- Healthcare triage: Routing patient queries to appropriate medical specialists
How It Works
graph TD
Start[Customer Request] --> Look[Look at What They Need]
Look --> Decide{Which Expert Should Handle This?}
Decide -->|Technical Problem| Tech[Send to Tech Support]
Decide -->|Want to Buy| Sales[Send to Sales Team]
Decide -->|Account Question| Account[Send to Account Help]
Decide -->|General Question| General[Send to General Assistant]
Decide -->|Not Sure| Ask[Ask for More Details]
Tech --> TechTools[Give Access to Tech Guides]
Sales --> SalesTools[Give Access to Product Info]
Account --> AccountTools[Give Access to User Account]
General --> GeneralTools[Give Access to FAQ]
TechTools --> Work1[Work on Tech Problem]
SalesTools --> Work2[Work on Sales Question]
AccountTools --> Work3[Work on Account Issue]
GeneralTools --> Work4[Work on General Question]
Ask --> Understand[Better Understand Request]
Work1 --> Check{Is Answer Good?}
Work2 --> Check
Work3 --> Check
Work4 --> Check
Understand --> Decide
Check -->|Yes| Answer[Send Answer to Customer]
Check -->|No| Backup[Get Human Help]
Answer --> Record[Record What Happened]
Backup --> Record
Record --> End[Complete]
style Start fill:#6366f1
style Decide fill:#3E92CC
style Ask fill:#D8315B
style Check fill:#3E92CC
style End fill:#10b981Pros
- Specialization: Each route can be optimized for specific task types
- Scalability: Easy to add new routes without affecting existing ones
- Efficiency: Requests are handled by the most appropriate resources
- Flexibility: Dynamic routing based on context and confidence
- Clarity: Clear separation of concerns between different workflows
- Performance: Avoid unnecessary processing for simple requests
- Maintainability: Each route can be updated independently
Cons
- Router complexity: The routing logic itself can become a bottleneck
- Misrouting risks: Incorrect routing decisions can lead to poor outcomes
- Latency overhead: Additional step for routing decision adds delay
- Training requirements: Router needs continuous improvement based on feedback
- Edge cases: Ambiguous requests may not fit cleanly into categories
- Coordination overhead: Managing multiple specialized agents increases complexity
- Monitoring complexity: Need to track performance across multiple paths
Real-World Examples
AI Customer Service Hub
- Technical issues → Technical Support Agent with access to documentation
- Billing questions → Finance Agent with access to payment systems
- Product inquiries → Sales Agent with catalog access
- Complaints → Escalation Agent with CRM integration
- General questions → FAQ Agent with knowledge base
Content Creation Platform
- Blog posts → Long-form Writing Agent
- Social media → Short-form Content Agent
- Technical documentation → Technical Writing Agent
- Marketing copy → Copywriting Agent
- Translations → Localization Agent
Code Assistant Router
- Bug fixes → Debugging Agent with error analysis tools
- New features → Development Agent with design patterns
- Refactoring → Code Quality Agent with best practices
- Testing → Test Generation Agent with coverage tools
- Documentation → Documentation Agent with template library
Financial Services Router
- Trading requests → Trading Agent with market data
- Risk assessment → Risk Analysis Agent with models
- Compliance checks → Compliance Agent with regulations
- Reporting → Report Generation Agent with templates
- Fraud detection → Security Agent with pattern detection
Educational Platform Router
- Math problems → Mathematical Reasoning Agent
- Language learning → Language Tutor Agent
- Science questions → Science Expert Agent
- History queries → Historical Research Agent
- Study planning → Learning Strategy Agent
Chapter 3: Parallelization
TLDDR: Splitting a large job into independent chunks that can be processed at the same time by multiple workers. When we say workers here, that is a proxy for agents. Think of it like having 10 people each read different chapters of a book simultaneously, then combining all the summaries at the end.
When to Use
- Large-scale data processing: When processing multiple documents, records, or data sources
- Time-sensitive operations: When results are needed quickly and tasks are independent
- Batch operations: When performing the same operation on multiple items
- Web scraping/crawling: When gathering data from multiple sources simultaneously
- Multi-document analysis: When analyzing multiple files or documents independently
- API aggregation: When calling multiple APIs that don't depend on each other
Where It Fits
- Document processing pipelines: Analyzing multiple PDFs or reports simultaneously
- Data enrichment workflows: Enhancing records from multiple data sources
- Content generation: Creating multiple variations or translations in parallel
- Research automation: Searching multiple databases or sources concurrently
- Testing frameworks: Running multiple test scenarios simultaneously
How It Works
graph TD
Start[Big Job to Do] --> Look[Look at the Work]
Look --> Split[Split Into Smaller Pieces]
Split --> Check{Do We Have Resources?}
Check -->|Yes| Start_Workers[Start Multiple Workers]
Check -->|Limited| Queue[Take Turns with Limited Workers]
Start_Workers --> W1[Worker 1: Do Piece A]
Start_Workers --> W2[Worker 2: Do Piece B]
Start_Workers --> W3[Worker 3: Do Piece C]
Start_Workers --> W4[Worker 4: Do Piece D]
Queue --> Batch[Work in Small Groups]
W1 --> Try1{Did It Work?}
W2 --> Try2{Did It Work?}
W3 --> Try3{Did It Work?}
W4 --> Try4{Did It Work?}
Batch --> Try5{Did It Work?}
Try1 -->|Yes| Collect[Collect All Results]
Try1 -->|No| Wait1[Wait and Try Again]
Try2 -->|Yes| Collect
Try2 -->|No| Wait2[Wait and Try Again]
Try3 -->|Yes| Collect
Try3 -->|No| Wait3[Wait and Try Again]
Try4 -->|Yes| Collect
Try4 -->|No| Wait4[Wait and Try Again]
Try5 -->|Yes| Collect
Try5 -->|No| Wait5[Wait and Try Again]
Wait1 --> W1
Wait2 --> W2
Wait3 --> W3
Wait4 --> W4
Wait5 --> Batch
Collect --> Organize[Organize Results]
Organize --> Combine[Combine Everything]
Combine --> Final[Create Final Result]
Final --> Summary[Create Summary Report]
Summary --> End[Job Complete]
style Start fill:#6366f1
style Start_Workers fill:#3E92CC
style Check fill:#3E92CC
style Try1 fill:#3E92CC
style Try2 fill:#3E92CC
style Try3 fill:#3E92CC
style Try4 fill:#3E92CC
style Try5 fill:#3E92CC
style Wait1 fill:#D8315B
style Wait2 fill:#D8315B
style Wait3 fill:#D8315B
style Wait4 fill:#D8315B
style Wait5 fill:#D8315B
style Collect fill:#a855f7
style End fill:#10b981Pros
- Speed improvement: Dramatic reduction in total processing time
- Resource utilization: Better use of available computational resources
- Scalability: Easy to scale up or down based on workload
- Fault isolation: Failure in one worker doesn't affect others
- Progress tracking: Can show incremental progress as workers complete
- Flexibility: Can dynamically adjust worker count based on load
- Cost efficiency: Optimize resource usage and reduce idle time
Cons
- Complexity increase: Managing multiple concurrent processes is challenging
- Resource limits: API rate limits and quotas constrain parallelization
- Coordination overhead: Synchronization and result merging add complexity
- Debugging difficulty: Harder to trace issues in parallel execution
- Cost multiplication: Multiple simultaneous API calls increase costs
- Memory usage: Holding multiple results in memory can be resource-intensive
- Ordering challenges: Maintaining sequence when needed requires extra logic
Real-World Examples
News Aggregation Service
- Simultaneously fetch articles from 50+ news sources
- Each worker processes one news source
- Rate limit to 10 concurrent API calls
- Merge and deduplicate results
- Sort by relevance and timestamp
E-commerce Price Monitoring
- Monitor prices across 100+ competitor sites
- Parallel workers scrape product pages
- Handle retry logic for failed requests
- Aggregate pricing data into comparison matrix
- Generate price change alerts
Document Intelligence System
- Process 1000+ page legal document set
- Split into 50-page chunks for parallel analysis
- Each worker extracts entities and clauses
- Merge findings into comprehensive report
- Track document provenance for each finding
Code Repository Analysis
- Scan entire codebase for security vulnerabilities
- Parallel workers analyze different directories
- Each worker runs different security checks
- Collect and prioritize all findings
- Generate comprehensive security report
Multi-language Translation Project
- Translate documentation into 15 languages
- Parallel workers for each language pair
- Maintain consistency with translation memory
- Quality check each translation
- Compile into multi-language documentation set
Chapter 4: Reflection
TLDDR: Generate a first draft, then have a critic review it against quality standards. Based on the feedback, revise and improve. Essentially repeat this until you meet your quality standards. Think of it like writing an essay, having a teacher review it for you, and then making improvements until you finally get a passing grade.
When to Use
- Quality-critical outputs: When high accuracy and quality are non-negotiable
- Complex reasoning tasks: When problems require iterative refinement
- Creative work: When content needs multiple rounds of improvement
- Learning systems: When you want to improve performance over time
- Error-prone domains: When initial attempts often have mistakes
- Compliance requirements: When outputs must meet specific standards
Where It Fits
- Content creation: Blog posts, reports, and documentation requiring polish
- Code generation: Producing bug-free, optimized code
- Legal document drafting: Ensuring accuracy and completeness
- Academic writing: Research papers needing fact-checking and citations
- Product descriptions: E-commerce content requiring SEO and accuracy
How It Works
graph TD
Start[Initial Request] --> Generate[Generate First Draft]
Generate --> Output1[Initial Output]
Output1 --> Critic{Critic Agent Review}
Critic --> Rubric[Apply Quality Rubrics]
Critic --> Tests[Run Unit Tests]
Critic --> Check[Grammar & Logic Check]
Rubric --> Score1{Quality Score}
Tests --> Score2{Test Results}
Check --> Score3{Check Results}
Score1 --> Evaluate{Meets Criteria?}
Score2 --> Evaluate
Score3 --> Evaluate
Evaluate -->|No| Feedback[Generate Structured Feedback]
Evaluate -->|Yes| Accept[Accept Output]
Feedback --> Revise[Revision Agent]
Revise --> Address[Address Each Issue]
Address --> Output2[Revised Output]
Output2 --> Counter{Iteration Count}
Counter -->|< Max| Critic
Counter -->|>= Max| Converge[Use Best Version]
Accept --> Record[Record Success Patterns]
Converge --> Record
Record --> Learn[Update Prompts/Rules]
Learn --> Final[Final Output]
Final --> End[Deliver Result]
style Start fill:#6366f1
style Critic fill:#3E92CC
style Evaluate fill:#3E92CC
style Accept fill:#10b981
style Converge fill:#D8315B
style End fill:#10b981Pros
- Quality improvement: Systematic enhancement through multiple iterations
- Error reduction: Catches and fixes mistakes before final delivery
- Objectivity: Separation of generation and critique roles
- Learning capability: System improves over time from patterns
- Transparency: Clear feedback trail for improvements
- Flexibility: Can adjust critique criteria for different use cases
- Consistency: Applies same quality standards uniformly
Cons
- Increased latency: Multiple iterations multiply processing time
- Higher costs: Each reflection cycle incurs additional API calls
- Context window limits: Long documents may exceed token limits
- Diminishing returns: Later iterations may provide minimal improvement
- Over-optimization: Risk of making content generic or losing voice
- Complexity: Requires careful tuning of critique criteria
- API throttling: Multiple rapid calls may hit rate limits
Real-World Examples
Technical Blog Post Creation
- Initial draft generation
- Technical accuracy review
- Code example validation
- SEO optimization check
- Readability improvements
- Final grammar and style polish
Contract Generation System
- Draft initial contract terms
- Legal compliance review
- Risk assessment critique
- Clarity and ambiguity check
- Client-specific customization
- Final legal review
Educational Content Development
- Create lesson content
- Pedagogical effectiveness review
- Factual accuracy verification
- Age-appropriateness check
- Engagement factor assessment
- Accessibility improvements
Software Documentation
- Generate API documentation
- Technical accuracy review
- Code example testing
- Completeness check
- Clarity improvements
- Version consistency validation
Marketing Copy Refinement
- Initial copy generation
- Brand voice alignment check
- Persuasiveness assessment
- Fact and claim verification
- SEO keyword optimization
- A/B test variant creation
Research Report Writing
- Draft research findings
- Methodology critique
- Statistical validation
- Citation verification
- Logical flow improvement
- Executive summary refinement
Chapter 5: Tool Use
TLDDR: When the AI needs external information or actions, it discovers available tools, checks permissions, and then calls the right tool with proper parameters. Think of it like a chef who needs ingredients, checks what's available in the pantry, then verifies they can use it, and then retrieves and actually uses it in the recipe.
When to Use
- External data access: When agents need real-time or dynamic information
- System integration: When connecting to databases, APIs, or services
- Computational tasks: When precise calculations or data processing is needed
- File operations: When reading, writing, or manipulating files
- Action execution: When agents need to perform concrete actions
- Multi-step workflows: When combining AI reasoning with tool execution
Where It Fits
- Research assistants: Web search, document retrieval, fact-checking
- Data analysis workflows: Database queries, calculations, visualizations
- DevOps automation: System commands, deployment tools, monitoring
- Customer service: CRM access, ticket management, knowledge base queries
- Content management: File operations, publishing tools, asset management
How It Works
graph TD
Start[User Request] --> Analyze[Analyze Task Requirements]
Analyze --> Discover[Discover Available Tools]
Discover --> Catalog[Tool Catalog]
Catalog --> API1[Web Search API]
Catalog --> API2[Database Query Tool]
Catalog --> API3[Calculator Function]
Catalog --> API4[File System Access]
Catalog --> API5[External Service API]
Catalog --> Select{Select Appropriate Tool}
Select --> Match[Match Capabilities to Need]
Match --> Safety{Safety Check}
Safety -->|Pass| Prepare[Prepare Tool Call]
Safety -->|Fail| Deny[Deny Access with Reason]
Prepare --> Validate[Validate Input Parameters]
Validate --> Call[Execute Tool with Arguments]
Call --> Handle{Handle Response}
Handle -->|Success| Parse[Parse Tool Output]
Handle -->|Error| ErrorHandle[Error Recovery]
Handle -->|Timeout| Retry[Retry with Backoff]
ErrorHandle --> Fallback[Use Fallback Method]
Retry --> Call
Parse --> Normalize[Normalize for LLM]
Fallback --> Normalize
Normalize --> Process[Process with Context]
Process --> Decision{Next Action?}
Decision -->|More Tools| Select
Decision -->|Complete| Audit[Audit Tool Usage]
Deny --> Log[Log Security Event]
Audit --> Redact[Redact Sensitive Data]
Log --> Redact
Redact --> Result[Generate Final Response]
Result --> End[Return to User]
style Start fill:#6366f1
style Select fill:#3E92CC
style Safety fill:#D8315B
style Handle fill:#3E92CC
style End fill:#10b981Pros
- Capability extension: Agents can perform actions beyond text generation
- Real-time data: Access to current information not in training data
- Precision: Exact calculations and deterministic operations
- Integration: Seamless connection to existing systems and services
- Automation: Complete end-to-end workflows without human intervention
- Flexibility: Dynamic tool selection based on task requirements
- Auditability: Clear log of all tool usage and parameters
Cons
- Security risks: Tool access must be carefully controlled
- Error propagation: Tool failures can break entire workflows
- Latency addition: Each tool call adds processing time
- Cost accumulation: External API calls may incur charges
- Complexity: Managing tool schemas and error handling
- Dependency risks: Reliance on external services availability
- Data sensitivity: Need careful handling of credentials and private data
Real-World Examples
Financial Analysis Assistant
- Stock price API for real-time quotes
- Calculator for portfolio calculations
- Database queries for historical data
- Chart generation tools for visualizations
- Email API for report distribution
Code Development Helper
- File system access for reading/writing code
- Compiler/interpreter for code execution
- Git commands for version control
- Testing frameworks for validation
- Documentation generators
E-commerce Order Management
- Inventory database queries
- Payment processing APIs
- Shipping service integrations
- Email/SMS notification tools
- CRM system updates
Research Paper Assistant
- Academic database searches (PubMed, arXiv)
- Citation management tools
- PDF parsing and extraction
- Reference formatting tools
- Plagiarism checking APIs
Smart Home Controller
- IoT device APIs (lights, thermostats)
- Weather service integration
- Calendar access for scheduling
- Energy monitoring tools
- Security system controls
HR Recruitment System
- Resume parsing tools
- LinkedIn and job board APIs
- Calendar scheduling tools
- Email automation
- Background check services
- Video interview platforms