Agents & Subagents
AgentToolThe most architecturally interesting part of Claude Code. Spawn isolated agents with separate token budgets, zero-cost cache sharing, and 3 isolation levels — all from a single AgentTool call.
- AgentTool spawns a full Claude instance with its own token budget. The child agent runs the same query() loop independently — including its own tool calls, compaction, and recovery.
- CacheSafeParams (Step 3 of spawn flow) freezes the system prompt bytes at fork time. If parent and child have identical bytes, the API returns a prompt cache hit — zero extra cost for spawning.
- 3 isolation modes: Local (in-process, shared state), Worktree (git-isolated branch), Remote (CCR server, full isolation). Pick the level of risk your task warrants.
- 6 forked background agents run automatically: speculation (pre-executes next steps), extractMemories, promptSuggestion, sessionMemory, compaction, and autoDream (after 24h + 5 sessions).
The Agent Spawning Flow
Every subagent follows this exact 6-step lifecycle. Understanding it unlocks why cache sharing is the key innovation.
CacheSafeParams freezes the system prompt bytes at fork time. If parent and subagent bytes match exactly, the API returns a prompt cache hit — making every forked agent nearly free.
3 Execution Modes
Choose the right isolation level for the task. Default is always Local.
Local
In-process- Memory extraction
- Prompt suggestions
- Auto-compaction
- Background annotations
Worktree
Git isolation- Parallel feature development
- Risky refactors
- Experimental changes
- Code review agents
Remote
CCR- True parallelism at scale
- Sensitive operations
- Long-running background work
- Multi-machine workflows
| Feature | Local | Worktree | Remote |
|---|---|---|---|
| Filesystem isolation | ✗ | ✓ | ✓ |
| Git branch isolation | ✗ | ✓ | ✓ |
| Background execution | opt | opt | ✓ |
| Shared AppState | ✓ | ✗ | ✗ |
| Cache sharing | ✓ | ✓ | ✗ |
| Startup cost | ~0 tokens | Low | High |
Zero-Cost Cache Sharing
The key innovation that makes 6+ background agents per turn economically viable.
Instead of paying 10K tokens each time × 6 agents = 60K tokens per turn, Claude Code pays once and gets 6 cache hits. At scale: this is the difference between affordable and prohibitive.
systemPromptMust match exactly byte-for-byte
userContextWorking dir, platform, git state
toolSchemaAll available tool definitions
thinkingConfigThinking mode settings
// CacheSafeParams — frozen at fork time
{
systemPrompt: bytes // Must match exactly — any change = cache MISS
userContext: variables // Working dir, platform, git
systemContext: resources // Available tools, MCP servers
toolSchema: definitions // All tool definitions
thinkingConfig: config // Thinking mode settings
}
// Same bytes → prompt cache HIT → ~0 cost for subagent system prompt
// Changed bytes → cache MISS → pay full 10K+ tokens againForked Agents Timeline
Background agents users never see — but that make the product feel smarter. Each fires at a specific phase.
speculationFast pre-exec in /tmp
extractMemoriespromptSuggestionsessionMemorycompactionautoDream (24h+5)speculationPre-executes likely next steps in /tmp overlay
extractMemoriesExtracts facts to CLAUDE.md after each query
promptSuggestionGenerates 3 follow-up prompt ideas
sessionMemoryPeriodic conversation notes across sessions
compactionConversation summary when context hits threshold
autoDreamMemory consolidation after 24h + 5 sessions
Multi-Agent Coordinator Pattern
The leader/worker pattern Claude Code encodes into its own system prompt. Synthesis stays with the leader; execution fans out.
"The coordinator must read worker findings, understand them itself, and write follow-up prompts with concrete file paths and changes. It cannot hand-wave research away. Synthesis stays with the leader, execution fans out."
// coordinator/coordinatorMode.ts — key rules (~370 lines)
getCoordinatorUserContext(...)
→ enumerates worker tool access
→ injects MCP server names
→ may expose scratchpadDir for cross-worker knowledge
getCoordinatorSystemPrompt()
→ workers are async, launch independent work in parallel
→ don't use workers for trivial file/command reporting
→ don't say "based on your findings"
→ always synthesize findings before delegating
→ verification means proving behavior, not just code presence
→ phases: research → synthesis → implementation → verificationBuilt-in Agent Types
| Type | Purpose | Key Capability |
|---|---|---|
general-purpose | Default for complex tasks | Full tool access |
Explore | Fast codebase exploration | Search-focused, no write tools |
Plan | Architecture planning | Design docs, no code changes |
code-review | PR code review | Read-only analysis |
simplicity-engineer | Over-engineering review | Complexity analysis |
Custom (.claude/agents/) | User-defined agents | YAML/MD with frontmatter |
Drop a .md or .yaml file in .claude/agents/ with frontmatter (name, description, tools, model). The agent becomes available system-wide. Custom agents can restrict tools to a safe subset — e.g., a read-only reviewer.
Prompt caching makes subagents nearly free
The compaction, MCP, memory extraction, and speculation services that power background agents.
How the system prompt is assembled and why its exact byte content matters so much for cache sharing.
How subagents inherit and are restricted by permission modes from the parent session.