CC
Claude Code
v2.1.88
Claude CodeServices

Services

20+ services

Claude Code's service layer handles compaction, MCP integration, LSP, analytics, memory extraction, and more — running in parallel with the main query loop.

TL;DR — Key Takeaways
  • MCP is the largest single service at 470KB across 25 files — bigger than BashTool. External tool integration is a first-class architecture concern, not an afterthought.
  • Compaction has 4 escalating strategies (Microcompact → Snipping → Autocompact → Collapse) that activate from lightest to heaviest. Without them, sessions would hit the token wall every few hours.
  • The API layer handles streaming, retries, beta-flag management, and prompt cache control — all transparently. Most developers never touch it directly, but it fires on every single turn.
  • Bridge/Remote (35KB, 20 files) is a second top-level execution path alongside the local REPL — remote sessions dispatch work through the same QueryEngine/query() core.
Key Insight

MCP is the largest single service at 470KB across 25 files — larger than BashTool. External tool integration is treated as a first-class architectural concern, not an afterthought.

Service Overview

Start here to see which responsibilities live outside the main query loop.

~15K
Compaction
13 files

4-level context window management

470KB
MCP
25 files

External tool integration (4 transports)

~5K
LSP
6 files

Language Server Protocol

~8K
Analytics
6 files

Datadog + GrowthBook pipeline

~6K
Memory
5 files

Auto-extraction + session memory

~45K
API
8 files

Streaming client, retries, betas, prompt caching

~1K
Tools
2 files

StreamingToolExecutor + orchestration

~10K
Plugins
8 files

Plugin install + marketplace

~2K
Tokens
1 files

Multi-provider token counting

~35K
Bridge/Remote
20 files

Remote sessions, work dispatch, reconnect logic

Compaction System

This section explains how Claude Code keeps long sessions alive instead of hitting a hard wall.

Multi-level context window management keeps conversations within token limits. Four strategies with increasing aggressiveness:

least aggressivemost aggressive
MicrocompactEvery API call

Single-turn inline compression. Uses cached tool results. No extra API call.

History SnippingFeature-gated thresholdmore aggressive

Removes oldest messages below threshold. Less aggressive than autocompact.

AutocompactToken threshold triggermore aggressive

Full conversation summary via forked agent. Replaces old messages. Circuit breaker: max 3 failures.

Context CollapseExperimentalmore aggressive

Incremental context reduction. Builds collapse store separately. Projected at read-time (non-destructive).

typescript
// Token budget calculation:
effective_window = model_context (e.g., 200K for opus)
  - max_output_tokens (e.g., 16K)
  - reserved_for_summary (20K)
  = ~164K effective

autocompact_threshold = effective_window - 13K buffer

MCP (Model Context Protocol)

Read this when you want to understand how Claude Code turns external MCP servers into first-class tools.

The MCP service is the largest service at 470KB across 25 files. It enables Claude Code to integrate external tools from any MCP-compatible server.

stdio
Local process communication
LatencyLowest — direct pipe
UseLocal tools, shell commands, filesystem access
📡
SSE
Server-Sent Events (HTTP streaming)
LatencyLow — persistent HTTP stream
UseRemote servers with streaming responses
🌐
HTTP
Standard HTTP requests
LatencyMedium — per-request round trip
UseREST APIs, web services, stateless tools
🔌
WebSocket
Full-duplex communication
LatencyLow — persistent bidirectional
UseReal-time tools, interactive services
typescript
// How MCP tools work:
1. MCP server exposes tools via JSON schema
2. mcpClient.ts patches MCPTool definition at runtime:
   - Sets real tool name (e.g., "mcp_weather_get_current")
   - Injects actual input/output schemas
   - Wires up call() to invoke MCP server RPC
3. MCPTool uses passthrough schema (z.object({}).passthrough())
4. No validation at Claude Code layer — MCP server is responsible

// Key files:
client.ts    — 119KB — Protocol client orchestrator
config.ts    — 51KB  — Settings, env vars, server validation
auth.ts      — 88KB  — OAuth flow, token management
elicitationHandler.ts — User prompts during tool calls

LSP (Language Server Protocol)

FileEditTool → LSP integration flow
1
FileEditTool saves
File written to disk
2
Notifies LSP
didChange/didSave event sent
3
Triggers diagnostics
Errors & warnings surfaced to model
typescript
// LSP provides IDE-like features:
- Diagnostics (errors, warnings)
- Hover information
- Go-to-definition
- Code completions

// Architecture:
LSPServerManager (singleton)
  └─ LSPServerInstance[] (per language/framework)
       └─ LSPClient (protocol implementation)
            └─ LSPDiagnosticRegistry (collects diagnostics)

// Lifecycle:
initializeLspServerManager()  → Async init with generation counter
getLspServerManager()          → Get active manager (undefined if not ready)
getInitializationStatus()      → not-started | pending | success | failed

// Integration with tools:
FileEditTool → Notifies LSP of file changes → Triggers diagnostics
FileWriteTool → Same notification path
LSPTool → Direct query interface for the model

Analytics Pipeline

Best TypeScript type name in the codebase
AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS

This is a real TypeScript type. The name itself is the enforcement mechanism — every analytics event must be typed with this, forcing the developer to consciously confirm they aren't accidentally logging user code or file paths. It's the most creative use of a type name for PII safety we've ever seen.

typescript
// Event pipeline with queue-until-sink pattern:
logEvent(name, metadata)        → Sync event logging
logEventAsync(name, metadata)   → Async event logging
attachAnalyticsSink()           → Register backend (Datadog, 1P)

// The safety type — you must use this for every analytics event:
type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = { ... }
// → Enforces: no file content, no user code in analytics

// PII handling:
_PROTO_* keys → PII-tagged columns (Anthropic 1P only)
stripProtoFields() → Removes PII before Datadog fanout

// GrowthBook integration (feature gates):
checkStatsigFeatureGate_CACHED_MAY_BE_STALE()
// → Cached gate values prevent blocking on init
// → User attributes: ID, session, platform, org, subscription
// → A/B experiment tracking with variation IDs

API Layer Is a Policy Engine

This is the deeper service section: the API layer decides caching, retries, betas, and stream normalization.

One of the easiest things to underestimate in Claude Code is the API client. claude.ts is not a thin transport wrapper: it decides which beta headers to send, how to split cacheable vs dynamic system prompt sections, how to retry recoverable failures, when to record quota/cost state, and how to normalize streamed content back into the internal message model.

typescript
// services/api/claude.ts
build request:
  → normalizeMessagesForAPI(...)
  → splitSysPromptPrefix(...) for prompt caching
  → choose beta headers (fast mode, effort, structured outputs, tool search)
  → attach attribution + client request IDs

stream response:
  → normalizeContentFromAPI(...)
  → ensureToolResultPairing(...)
  → capture usage deltas + request fingerprints
  → update quota/cost/session activity

failure path:
  → withRetry(...)
  → distinguish abort / timeout / 529 / fallback-triggered cases
  → emit assistant-visible API error messages

Bridge & Remote Execution

Use this section to understand how Claude Code can operate as remote capacity, not only as a local CLI loop.

The newer repo has a substantial bridge/remote layer that the older analysis pages barely mentioned. bridgeMain.ts is effectively a miniature control plane: it polls for work, spawns or reconnects sessions, heartbeats active jobs, refreshes ingress tokens, manages worktrees, and tears sessions down safely.

typescript
// bridge/bridgeMain.ts
runBridgeLoop(config, environmentId, secret, api, spawner, logger, signal)
  → poll bridge API for work
  → spawn local session or reconnect existing session
  → send heartbeatWork() for active jobs
  → refresh ingress/JWT tokens
  → create/remove agent worktrees
  → wake capacity when sessions finish
  → stop or reconnect timed-out sessions

// related files:
sessionRunner.ts        // child session spawning
workSecret.ts           // SDK / worker registration secrets
bridgeApi.ts            // typed bridge API client
remote/*.ts             // session manager + websocket transport

Speculation & Prompt Suggestions

This explains the hidden background work Claude Code performs to make the next step feel faster.

Another service family worth studying is PromptSuggestion. It is no longer just a UI nicety: speculation.ts creates copy-on-write overlays under /tmp, forks a cheap background agent using cache-safe params, pre-executes likely next steps, and can copy successful writes back into the main working directory.

typescript
// services/PromptSuggestion/speculation.ts
getOverlayPath(id) → /tmp/.../speculation/<pid>/<id>
prepareMessagesForInjection(messages)
runForkedAgent(cacheSafeParams, ...)
copyOverlayToMain(overlayPath, writtenPaths, cwd)

guards:
- stop at write tools outside the overlay rules
- stop on denied tools or non-read-only bash
- cap to 20 turns / 100 messages
- log speculation outcome + time saved

There are really two orchestration layers. toolOrchestration.ts handles already-buffered tool blocks in ordered batches; StreamingToolExecutor handles the earlier phase where tool_use blocks are still arriving over the wire and must be launched optimistically without breaking ordering guarantees.

typescript
// services/tools/ — Two key files:

// 1. toolOrchestration.ts (189 lines) — runTools() generator
//    Batch partitioning and serial/concurrent execution
//    Read-only batch → up to 10 parallel
//    Write batch → serial with context modifiers

// 2. StreamingToolExecutor.ts (226 lines)
//    Concurrent execution while model streams
//    addTool() → enqueue as tool_use blocks arrive
//    processQueue() → respect concurrency constraints
//    getCompletedResults() → yield finished results
//    discard() → cleanup on streaming fallback
//    siblingAbortController → kill sibling subprocesses on bash error
⚙️

Memory extraction runs after every single query

After EVERY query, Claude runs extractMemories() in the background as a forked agent. After 24 hours + 5 sessions, autoDream() fires — a deeper memory consolidation pass. These agents are invisible to the user but silently make Claude smarter about your codebase over time.