Services Module

110 files · ~80K lines

The Service Layer — 8 background services that power everything. API client, MCP runtime (470KB!), context compaction, LSP diagnostics, memory extraction, analytics, tool orchestration, and token management.

services/services/api/claude.ts services/mcp/services/compact/

Service Catalogue — 8 Services

Each service is an independent subsystem with a clear input/output contract.

services/api/

30+ files3500+ lines

·Builds Anthropic API requests with full system prompt, tools, and message history
·Handles streaming token events: text_delta, tool_use start/delta/stop
·Token budget management — enforces context limits before sending

Key fact: claude.ts is the largest single file in the entire codebase

utils/messages→api→LLM responses

services/mcp/

20+ files470KB

·MCP protocol client supporting 4 transport types: stdio, SSE, HTTP, WebSocket
·Fetches tool definitions from connected servers at startup
·Patches MCP tools directly into Claude's tool namespace at runtime

Key fact: 470KB — the single largest service. Bigger than most npm packages.

config (server list)→mcp→MCP tools

services/compact/

13 files~15KB

·4-strategy escalating compression pipeline triggered when context approaches limit
·Strategies escalate from simple truncation to full AI-summarization of history
·Tracks compression ratios and chooses least-aggressive strategy that fits

Key fact: Strategy 4 (Full Context Replacement) is the nuclear option — replaces everything except the last N turns

services/api (for summarization)→compact→compressed messages

services/lsp/

6 files~8KB

·Language Server Protocol integration for IDE-quality diagnostics
·Runs linters/type checkers as background processes connected via LSP
·Surfaces errors inline in tool results so Claude can fix them immediately

Key fact: Claude can see TypeScript errors before running the code — same data flow as your IDE

project root path→lsp→LSP diagnostics

services/extractMemories/

~5 files~15KB

·Background agent that runs after every conversation turn
·Sends recent messages to Claude: 'what facts are worth remembering?'
·Stores extracted memories as YAML in ~/.claude/memories/ for future sessions

Key fact: Uses Claude to build Claude's long-term memory — recursive self-improvement of context

services/api (Claude call)→extractMemories→~/.claude/memories/*.yaml

services/analytics/

~8 files~10KB

·Async event pipeline — usage events are queued and never block the main loop
·Drains to Datadog metrics + first-party analytics endpoint
·Includes a PII safety type called SanitizedEventProperties to mark clean data

Key fact: PII safety type: SanitizedEventProperties — the data is safe if the type says so

nothing (fire and forget)→analytics→metrics/events

services/tools/

~10 files~20KB

·StreamingToolExecutor — queues tool_use blocks as they arrive from the stream
·Concurrency guard — manages parallel vs sequential tool execution
·Routes each tool call through validate → checkPermissions → invoke → renderResult

Key fact: StreamingToolExecutor starts executing tool calls before the full response is received — parallelism at stream time

services/mcp (for MCP tools)→tools→tool results

services/tokens/

~5 files~8KB

·Tracks token consumption across the session for reporting and budget enforcement
·Projects remaining token budget for the current context window
·Feeds into services/compact to decide when compaction should trigger

Key fact: Tokens are counted before sending, not after — the budget projection runs before every API call

utils/messages (for counting)→tokens→token counts + budgets

API Call Lifecycle — 7 Steps

A single API call touches multiple services in sequence.

QueryEngine calls query()

The main query loop requests a new API call. It passes the full message history and system prompt.

services/api/claude.ts — buildRequest()

Constructs the Anthropic API request. Injects tools, handles model selection, applies token budget limits.

Anthropic API — streams tokens

Response arrives as a stream of events: text_delta, tool_use start/delta/stop. Each event is processed immediately.

StreamingToolExecutor — queues tool_use

As tool_use blocks complete, they are enqueued. The executor decides execution order and concurrency.

services/tools/ — runs tools

Each queued tool runs through the Tools module: validate → checkPermissions → invoke → renderResult.

services/analytics/ — async drain

Usage events are queued and drained asynchronously. Never blocks the main loop.

services/extractMemories/ — post-turn

After each turn completes, a background agent mines the conversation for facts worth persisting across sessions.

Compaction — 4 Escalating Strategies

When context gets large, compaction triggers. Each strategy is more aggressive than the last. Strategy 4 is the nuclear option.

services/compact/

Window Trim

aggressiveness

20%

Drop oldest messages that exceed context budget. Simple FIFO drop — no summarization yet.

Tool Result Truncation

aggressiveness

40%

Truncate large tool outputs (file contents, bash output) to their first N tokens. Tool call is preserved.

Conversation Summarization

aggressiveness

65%

Send older conversation turns to Claude for summarization. Compressed summary replaces raw messages.

Full Context Replacement

aggressiveness

90%

Replace everything except the last N turns with a single summary. Nuclear option — only when critically close to limit.

Strategy 4 = Context Collapse (Nuclear Option) — Replaces everything except the last N turns with a single AI-generated summary. Information is lost — but the session continues.

MCP — 4 Transport Types

470KB of MCP client code — all behind one unified interface. Different transports for different deployment contexts.

services/mcp/

⚙️stdio

Local subprocesses (npx servers)

~1ms (in-process)

Spawns a local subprocess. Communication over stdin/stdout JSON-RPC. Most common for local tools.

📡SSE

Remote HTTP servers

~50-200ms (network)

HTTP Server-Sent Events. The MCP server runs as an HTTP endpoint. Claude receives a stream of events.

🌐HTTP

Stateless REST APIs

~50-200ms (network)

Plain HTTP REST. Each tool call is a POST request. Stateless — no persistent connection required.

🔄WebSocket

Bidirectional streaming

~10-50ms (persistent)

Full duplex WebSocket. The server can push updates to Claude mid-execution.

Patch-at-runtime pattern — MCP tool definitions are fetched from connected servers at startup and patched directly into the Claude tool namespace. Claude sees MCP tools as if they were native built-in tools.

Query/Engine Module

When compaction triggers, what the loop looks like, and how API calls fit in the 7-phase cycle.

Tools Module

StreamingToolExecutor (services/tools) is the bridge between the query loop and the 43 tools.

Services Deep Dive

The main site's services page — more on MCP, compaction strategies, and memory.