Free Practice Questions for Anthropic Claude Certified Architect – Foundations Certification

    🔄 Last checked for updates March 15th, 2026

    Study with 360 exam-style practice questions designed to help you prepare for the Anthropic Claude Certified Architect – Foundations.

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Exam Information

    Exam Details

    Key information about Anthropic Claude Certified Architect – Foundations

    Official study guide

    View

    Question formats CertSafari offers
    • Multiple choice
    exam format:

    Multiple choice, scenario-based questions

    passing score:

    720 out of 1000

    target audience:

    Solution architect who designs and implements production applications with Claude, with 6+ months practical experience

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Agentic Architecture & Orchestration

    Subdomain 1.1: Design and implement agentic loops for autonomous task execution

    Knowledge of:

    - The agentic loop lifecycle: sending requests to Claude, inspecting stop_reason ("tool_use" vs "end_turn"), executing requested tools, and returning results for the next iteration - How tool results are appended to conversation history so the model can reason about the next action - The distinction between model-driven decision-making (Claude reasons about which tool to call next based on context) and pre-configured decision trees or tool sequences

    Skills in:

    - Implementing agentic loop control flow that continues when stop_reason is "tool_use" and terminates when stop_reason is "end_turn" - Adding tool results to conversation context between iterations so the model can incorporate new information into its reasoning - Avoiding anti-patterns such as parsing natural language signals to determine loop termination, setting arbitrary iteration caps as the primary stopping mechanism, or checking for assistant text content as a completion indicator

    Subdomain 1.2: Orchestrate multi-agent systems with coordinator-subagent patterns

    Knowledge of:

    - Hub-and-spoke architecture where a coordinator agent manages all inter-subagent communication, error handling, and information routing - How subagents operate with isolated context—they do not inherit the coordinator's conversation history automatically - The role of the coordinator in task decomposition, delegation, result aggregation, and deciding which subagents to invoke based on query complexity - Risks of overly narrow task decomposition by the coordinator, leading to incomplete coverage of broad research topics

    Skills in:

    - Designing coordinator agents that analyze query requirements and dynamically select which subagents to invoke rather than always routing through the full pipeline - Partitioning research scope across subagents to minimize duplication (e.g., assigning distinct subtopics or source types to each agent) - Implementing iterative refinement loops where the coordinator evaluates synthesis output for gaps, re-delegates to search and analysis subagents with targeted queries, and re-invokes synthesis until coverage is sufficient - Routing all subagent communication through the coordinator for observability, consistent error handling, and controlled information flow

    Subdomain 1.3: Configure subagent invocation, context passing, and spawning

    Knowledge of:

    - The Task tool as the mechanism for spawning subagents, and the requirement that allowedTools must include "Task" for a coordinator to invoke subagents - That subagent context must be explicitly provided in the prompt—subagents do not automatically inherit parent context or share memory between invocations - The AgentDefinition configuration including descriptions, system prompts, and tool restrictions for each subagent type - Fork-based session management for exploring divergent approaches from a shared analysis baseline

    Skills in:

    - Including complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis outputs to the synthesis subagent) - Using structured data formats to separate content from metadata (source URLs, document names, page numbers) when passing context between agents to preserve attribution - Spawning parallel subagents by emitting multiple Task tool calls in a single coordinator response rather than across separate turns - Designing coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions, to enable subagent adaptability

    Subdomain 1.4: Implement multi-step workflows with enforcement and handoff patterns

    Knowledge of:

    - The difference between programmatic enforcement (hooks, prerequisite gates) and prompt-based guidance for workflow ordering - When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate - Structured handoff protocols for mid-process escalation that include customer details, root cause analysis, and recommended actions

    Skills in:

    - Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed (e.g., blocking process_refund until get_customer has returned a verified customer ID) - Decomposing multi-concern customer requests into distinct items, then investigating each in parallel using shared context before synthesizing a unified resolution - Compiling structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents who lack access to the conversation transcript

    Subdomain 1.5: Apply Agent SDK hooks for tool call interception and data normalization

    Knowledge of:

    - Hook patterns (e.g., PostToolUse) that intercept tool results for transformation before the model processes them - Hook patterns that intercept outgoing tool calls to enforce compliance rules (e.g., blocking refunds above a threshold) - The distinction between using hooks for deterministic guarantees versus relying on prompt instructions for probabilistic compliance

    Skills in:

    - Implementing PostToolUse hooks to normalize heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) from different MCP tools before the agent processes them - Implementing tool call interception hooks that block policy-violating actions (e.g., refunds exceeding $500) and redirect to alternative workflows (e.g., human escalation) - Choosing hooks over prompt-based enforcement when business rules require guaranteed compliance

    Subdomain 1.6: Design task decomposition strategies for complex workflows

    Knowledge of:

    - When to use fixed sequential pipelines (prompt chaining) versus dynamic adaptive decomposition based on intermediate findings - Prompt chaining patterns that break reviews into sequential steps (e.g., analyze each file individually, then run a cross-file integration pass) - The value of adaptive investigation plans that generate subtasks based on what is discovered at each step

    Skills in:

    - Selecting task decomposition patterns appropriate to the workflow: prompt chaining for predictable multi-aspect reviews, dynamic decomposition for open-ended investigation tasks - Splitting large code reviews into per-file local analysis passes plus a separate cross-file integration pass to avoid attention dilution - Decomposing open-ended tasks (e.g., "add comprehensive tests to a legacy codebase") by first mapping structure, identifying high-impact areas, then creating a prioritized plan that adapts as dependencies are discovered

    Subdomain 1.7: Manage session state, resumption, and forking

    Knowledge of:

    - Named session resumption using --resume <session-name> to continue a specific prior conversation - fork_session for creating independent branches from a shared analysis baseline to explore divergent approaches - The importance of informing the agent about changes to previously analyzed files when resuming sessions after code modifications - Why starting a new session with a structured summary is more reliable than resuming with stale tool results

    Skills in:

    - Using --resume with session names to continue named investigation sessions across work sessions - Using fork_session to create parallel exploration branches (e.g., comparing two testing strategies or refactoring approaches from a shared codebase analysis) - Choosing between session resumption (when prior context is mostly valid) and starting fresh with injected summaries (when prior tool results are stale) - Informing a resumed session about specific file changes for targeted re-analysis rather than requiring full re-exploration

    Domain 2: Tool Design & MCP Integration

    Subdomain 2.1: Design effective tool interfaces with clear descriptions and boundaries

    Knowledge of:

    - Tool descriptions as the primary mechanism LLMs use for tool selection; minimal descriptions lead to unreliable selection among similar tools - The importance of including input formats, example queries, edge cases, and boundary explanations in tool descriptions - How ambiguous or overlapping tool descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions) - The impact of system prompt wording on tool selection: keyword-sensitive instructions can create unintended tool associations

    Skills in:

    - Writing tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it versus similar alternatives - Renaming tools and updating descriptions to eliminate functional overlap (e.g., renaming analyze_content to extract_web_results with a web-specific description) - Splitting generic tools into purpose-specific tools with defined input/output contracts (e.g., splitting a generic analyze_document into extract_data_points, summarize_content, and verify_claim_against_source) - Reviewing system prompts for keyword-sensitive instructions that might override well-written tool descriptions

    Subdomain 2.2: Implement structured error responses for MCP tools

    Knowledge of:

    - The MCP isError flag pattern for communicating tool failures back to the agent - The distinction between transient errors (timeouts, service unavailability), validation errors (invalid input), business errors (policy violations), and permission errors - Why uniform error responses (generic "Operation failed") prevent the agent from making appropriate recovery decisions - The difference between retryable and non-retryable errors, and how returning structured metadata prevents wasted retry attempts

    Skills in:

    - Returning structured error metadata including errorCategory (transient/validation/permission), isRetryable boolean, and human-readable descriptions - Including retriable: false flags and customer-friendly explanations for business rule violations so the agent can communicate appropriately - Implementing local error recovery within subagents for transient failures, propagating to the coordinator only errors that cannot be resolved locally along with partial results and what was attempted - Distinguishing between access failures (needing retry decisions) and valid empty results (representing successful queries with no matches)

    Subdomain 2.3: Distribute tools appropriately across agents and configure tool choice

    Knowledge of:

    - The principle that giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability by increasing decision complexity - Why agents with tools outside their specialization tend to misuse them (e.g., a synthesis agent attempting web searches) - Scoped tool access: giving agents only the tools needed for their role, with limited cross-role tools for specific high-frequency needs - tool_choice configuration options: "auto", "any", and forced tool selection ({"type": "tool", "name": "..."})

    Skills in:

    - Restricting each subagent's tool set to those relevant to its role, preventing cross-specialization misuse - Replacing generic tools with constrained alternatives (e.g., replacing fetch_url with load_document that validates document URLs) - Providing scoped cross-role tools for high-frequency needs (e.g., a verify_fact tool for the synthesis agent) while routing complex cases through the coordinator - Using tool_choice forced selection to ensure a specific tool is called first (e.g., forcing extract_metadata before enrichment tools), then processing subsequent steps in follow-up turns - Setting tool_choice: "any" to guarantee the model calls a tool rather than returning conversational text

    Subdomain 2.4: Integrate MCP servers into Claude Code and agent workflows

    Knowledge of:

    - MCP server scoping: project-level (.mcp.json) for shared team tooling vs user-level (~/.claude.json) for personal/experimental servers - Environment variable expansion in .mcp.json (e.g., ${GITHUB_TOKEN}) for credential management without committing secrets - That tools from all configured MCP servers are discovered at connection time and available simultaneously to the agent - MCP resources as a mechanism for exposing content catalogs (e.g., issue summaries, documentation hierarchies, database schemas) to reduce exploratory tool calls

    Skills in:

    - Configuring shared MCP servers in project-scoped .mcp.json with environment variable expansion for authentication tokens - Configuring personal/experimental MCP servers in user-scoped ~/.claude.json - Enhancing MCP tool descriptions to explain capabilities and outputs in detail, preventing the agent from preferring built-in tools (like Grep) over more capable MCP tools - Choosing existing community MCP servers over custom implementations for standard integrations (e.g., Jira), reserving custom servers for team-specific workflows - Exposing content catalogs as MCP resources to give agents visibility into available data without requiring exploratory tool calls

    Subdomain 2.5: Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

    Knowledge of:

    - Grep for content search (searching file contents for patterns like function names, error messages, or import statements) - Glob for file path pattern matching (finding files by name or extension patterns) - Read/Write for full file operations; Edit for targeted modifications using unique text matching - When Edit fails due to non-unique text matches, using Read + Write as a fallback for reliable file modifications

    Skills in:

    - Selecting Grep for searching code content across a codebase (e.g., finding all callers of a function, locating error messages) - Selecting Glob for finding files matching naming patterns (e.g., **/*.test.tsx) - Using Read to load full file contents followed by Write when Edit cannot find unique anchor text - Building codebase understanding incrementally: starting with Grep to find entry points, then using Read to follow imports and trace flows, rather than reading all files upfront - Tracing function usage across wrapper modules by first identifying all exported names, then searching for each name across the codebase

    Domain 3: Claude Code Configuration & Workflows

    Subdomain 3.1: Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

    Knowledge of:

    - The CLAUDE.md configuration hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), and directory-level (subdirectory CLAUDE.md files) - That user-level settings apply only to that user—instructions in ~/.claude/CLAUDE.md are not shared with teammates via version control - The @import syntax for referencing external files to keep CLAUDE.md modular (e.g., importing specific standards files relevant to each package) - .claude/rules/ directory for organizing topic-specific rule files as an alternative to a monolithic CLAUDE.md

    Skills in:

    - Diagnosing configuration hierarchy issues (e.g., a new team member not receiving instructions because they're in user-level rather than project-level configuration) - Using @import to selectively include relevant standards files in each package's CLAUDE.md based on maintainer domain knowledge - Splitting large CLAUDE.md files into focused topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md, deployment.md) - Using the /memory command to verify which memory files are loaded and diagnose inconsistent behavior across sessions

    Subdomain 3.2: Create and configure custom slash commands and skills

    Knowledge of:

    - Project-scoped commands in .claude/commands/ (shared via version control) vs user-scoped commands in ~/.claude/commands/ (personal) - Skills in .claude/skills/ with SKILL.md files that support frontmatter configuration including context: fork, allowed-tools, and argument-hint - The context: fork frontmatter option for running skills in an isolated sub-agent context, preventing skill outputs from polluting the main conversation - Personal skill customization: creating personal variants in ~/.claude/skills/ with different names to avoid affecting teammates

    Skills in:

    - Creating project-scoped slash commands in .claude/commands/ for team-wide availability via version control - Using context: fork to isolate skills that produce verbose output (e.g., codebase analysis) or exploratory context (e.g., brainstorming alternatives) from the main session - Configuring allowed-tools in skill frontmatter to restrict tool access during skill execution (e.g., limiting to file write operations to prevent destructive actions) - Using argument-hint frontmatter to prompt developers for required parameters when they invoke the skill without arguments - Choosing between skills (on-demand invocation for task-specific workflows) and CLAUDE.md (always-loaded universal standards)

    Subdomain 3.3: Apply path-specific rules for conditional convention loading

    Knowledge of:

    - .claude/rules/ files with YAML frontmatter paths fields containing glob patterns for conditional rule activation - How path-scoped rules load only when editing matching files, reducing irrelevant context and token usage - The advantage of glob-pattern rules over directory-level CLAUDE.md files for conventions that span multiple directories (e.g., test files spread throughout a codebase)

    Skills in:

    - Creating .claude/rules/ files with YAML frontmatter path scoping (e.g., paths: ["terraform/**/*"]) so rules load only when editing matching files - Using glob patterns in path-specific rules to apply conventions to files by type regardless of directory location (e.g., **/*.test.tsx for all test files) - Choosing path-specific rules over subdirectory CLAUDE.md files when conventions must apply to files spread across the codebase

    Subdomain 3.4: Determine when to use plan mode vs direct execution

    Knowledge of:

    - Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, architectural decisions, and multi-file modifications - Direct execution is appropriate for simple, well-scoped changes (e.g., adding a single validation check to one function) - Plan mode enables safe codebase exploration and design before committing to changes, preventing costly rework - The Explore subagent for isolating verbose discovery output and returning summaries to preserve main conversation context

    Skills in:

    - Selecting plan mode for tasks with architectural implications (e.g., microservice restructuring, library migrations affecting 45+ files, choosing between integration approaches with different infrastructure requirements) - Selecting direct execution for well-understood changes with clear scope (e.g., a single-file bug fix with a clear stack trace, adding a date validation conditional) - Using the Explore subagent for verbose discovery phases to prevent context window exhaustion during multi-phase tasks - Combining plan mode for investigation with direct execution for implementation (e.g., planning a library migration, then executing the planned approach)

    Subdomain 3.5: Apply iterative refinement techniques for progressive improvement

    Knowledge of:

    - Concrete input/output examples as the most effective way to communicate expected transformations when prose descriptions are interpreted inconsistently - Test-driven iteration: writing test suites first, then iterating by sharing test failures to guide progressive improvement - The interview pattern: having Claude ask questions to surface considerations the developer may not have anticipated before implementing - When to provide all issues in a single message (interacting problems) versus fixing them sequentially (independent problems)

    Skills in:

    - Providing 2-3 concrete input/output examples to clarify transformation requirements when natural language descriptions produce inconsistent results - Writing test suites covering expected behavior, edge cases, and performance requirements before implementation, then iterating by sharing test failures - Using the interview pattern to surface design considerations (e.g., cache invalidation strategies, failure modes) before implementing solutions in unfamiliar domains - Providing specific test cases with example input and expected output to fix edge case handling (e.g., null values in migration scripts) - Addressing multiple interacting issues in a single detailed message when fixes interact, versus sequential iteration for independent issues

    Subdomain 3.6: Integrate Claude Code into CI/CD pipelines

    Knowledge of:

    - The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines - --output-format json and --json-schema CLI flags for enforcing structured output in CI contexts - CLAUDE.md as the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code - Session context isolation: why the same Claude session that generated code is less effective at reviewing its own changes compared to an independent review instance

    Skills in:

    - Running Claude Code in CI with the -p flag to prevent interactive input hangs - Using --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments - Including prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues to avoid duplicate comments - Providing existing test files in context so test generation avoids suggesting duplicate scenarios already covered by the test suite - Documenting testing standards, valuable test criteria, and available fixtures in CLAUDE.md to improve test generation quality and reduce low-value test output

    Domain 4: Prompt Engineering & Structured Output

    Subdomain 4.1: Design prompts with explicit criteria to improve precision and reduce false positives

    Knowledge of:

    - The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate") - How general instructions like "be conservative" or "only report high-confidence findings" fail to improve precision compared to specific categorical criteria - The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories

    Skills in:

    - Writing specific review criteria that define which issues to report (bugs, security) versus skip (minor style, local patterns) rather than relying on confidence-based filtering - Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories - Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification

    Subdomain 4.2: Apply few-shot prompting to improve output consistency and quality

    Knowledge of:

    - Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results - The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps) - How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases - The effectiveness of few-shot examples for reducing hallucination in extraction tasks (e.g., handling informal measurements, varied document structures)

    Skills in:

    - Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives - Including few-shot examples that demonstrate specific desired output format (location, issue, severity, suggested fix) to achieve consistency - Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization - Using few-shot examples to demonstrate correct handling of varied document structures (inline citations vs bibliographies, methodology sections vs embedded details) - Adding few-shot examples showing correct extraction from documents with varied formats to address empty/null extraction of required fields

    Subdomain 4.3: Enforce structured output using tool use and JSON schemas

    Knowledge of:

    - Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors - The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool) - That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields) - Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

    Skills in:

    - Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response - Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown - Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps - Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields - Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization - Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

    Subdomain 4.4: Implement validation, retry, and feedback loops for extraction quality

    Knowledge of:

    - Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction - The limits of retry: retries are ineffective when the required information is simply absent from the source document (vs format or structural errors) - Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns - The difference between semantic validation errors (values don't sum, wrong field placement) and schema syntax errors (eliminated by tool use)

    Skills in:

    - Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction - Identifying when retries will be ineffective (e.g., information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors) - Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings - Designing self-correction validation flows: extracting "calculated_total" alongside "stated_total" to flag discrepancies, adding "conflict_detected" booleans for inconsistent source data

    Subdomain 4.5: Design efficient batch processing strategies

    Knowledge of:

    - The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA - Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation) and inappropriate for blocking workflows (pre-merge checks) - The batch API does not support multi-turn tool calling within a single request (cannot execute tools mid-request and return results) - custom_id fields for correlating batch request/response pairs

    Skills in:

    - Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis - Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing) - Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking documents that exceeded context limits) - Using prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce iterative resubmission costs

    Subdomain 4.6: Design multi-instance and multi-pass review architectures

    Knowledge of:

    - Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session - Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking - Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution and contradictory findings

    Skills in:

    - Using a second independent Claude instance to review generated code without the generator's reasoning context - Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis - Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing

    Domain 5: Context Management & Reliability

    Subdomain 5.1: Manage conversation context to preserve critical information across long interactions

    Knowledge of:

    - Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries - The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections - How tool results accumulate in context and consume tokens disproportionately to their relevance (e.g., 40+ fields per order lookup when only 5 are relevant) - The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence

    Skills in:

    - Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history - Extracting and persisting structured issue data (order IDs, amounts, statuses) into a separate context layer for multi-issue sessions - Trimming verbose tool outputs to only relevant fields before they accumulate in context (e.g., keeping only return-relevant fields from order lookups) - Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects - Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis - Modifying upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains when downstream agents have limited context budgets

    Subdomain 5.2: Design effective escalation and ambiguity resolution patterns

    Knowledge of:

    - Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), and inability to make meaningful progress - The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward - Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity - How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection

    Skills in:

    - Adding explicit escalation criteria with few-shot examples to the system prompt demonstrating when to escalate versus resolve autonomously - Honoring explicit customer requests for human agents immediately without first attempting investigation - Acknowledging frustration while offering resolution when the issue is within the agent's capability, escalating only if the customer reiterates their preference - Escalating when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments) - Instructing the agent to ask for additional identifiers when tool results return multiple matches, rather than selecting based on heuristics

    Subdomain 5.3: Implement error propagation strategies across multi-agent systems

    Knowledge of:

    - Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions - The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches) - Why generic error statuses ("search unavailable") hide valuable context from the coordinator - Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns

    Skills in:

    - Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery - Distinguishing access failures from valid empty results in error reporting so the coordinator can make appropriate decisions - Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve, including what was attempted and partial results - Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources

    Subdomain 5.4: Manage context effectively in large codebase exploration

    Knowledge of:

    - Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier - The role of scratchpad files for persisting key findings across context boundaries - Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding - Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume

    Skills in:

    - Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination - Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation - Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context - Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts - Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output

    Subdomain 5.5: Design human review workflows and confidence calibration

    Knowledge of:

    - The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields - Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns - Field-level confidence scores calibrated using labeled validation sets for routing review attention - The importance of validating accuracy by document type and field segment before automating high-confidence extractions

    Skills in:

    - Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection - Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review - Having models output field-level confidence scores, then calibrating review thresholds using labeled validation sets - Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity

    Subdomain 5.6: Preserve information provenance and handle uncertainty in multi-source synthesis

    Knowledge of:

    - How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings - The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings - How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value - Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions

    Skills in:

    - Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis - Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations and methodological context - Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis - Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation - Rendering different content types appropriately in synthesis outputs—financial data as tables, news as prose, technical findings as structured lists—rather than converting everything to a uniform format

    Techniques & products

    Claude Code
    Claude Agent SDK
    Claude API
    Model Context Protocol (MCP)
    Agentic loops
    Multi-agent systems
    Subagents
    Task tool
    Agent SDK hooks
    Tool interfaces
    Built-in tools (Read, Write, Edit, Bash, Grep, Glob)
    CLAUDE.md
    Slash commands
    Skills
    Path-specific rules
    Plan mode
    Direct execution
    Iterative refinement
    CI/CD pipelines
    Prompt engineering
    Few-shot prompting
    Structured output
    JSON schemas
    Validation
    Retry loops
    Feedback loops
    Message Batches API
    Multi-instance review architectures
    Multi-pass review architectures
    Conversation context management
    Progressive summarization
    "Lost in the middle" effect
    Escalation patterns
    Ambiguity resolution patterns
    Error propagation strategies
    Scratchpad files
    Human review workflows
    Confidence calibration
    Information provenance
    Multi-source synthesis
    YAML frontmatter
    Glob patterns
    stop_reason
    isError flag
    tool_choice
    custom_id
    AgentDefinition
    PostToolUse hook
    fork_session
    --resume flag
    --print flag
    --output-format json flag
    --json-schema flag
    /memory command
    /compact command

    CertSafari is not affiliated with, endorsed by, or officially connected to Anthropic. Full disclaimer