flatten-mcp
Resume the exact same conversation at a lower token cost — without compacting it into a lossy summary.
flatten-mcp is a Model Context Protocol server for Claude Code. It shrinks a session's token footprint by moving bulky tool output (large file reads, command logs, base64 screenshots) out of the conversation and into a sidecar file — leaving a tiny, retrievable reference in its place. Your prompts and the chronological flow of the session are preserved verbatim — those lines are never rewritten. You resume the same raw conversation; it just costs less to carry.
See how 317,236 tokens turned into 182,287:
https://github.com/user-attachments/assets/4672b3cd-f78f-4146-97ba-e0077b655381
Why flatten instead of compact?
The standard answer to a full context window is compaction: the model reads the whole conversation and rewrites it into a shorter summary. That summary is lossy by construction — an interpretation of your history, and interpretations drift, smooth over the awkward parts, and quietly drop the detail you didn't know you'd need. But the history is exactly what's worth keeping verbatim: the words you typed at 2 a.m., the precise order of events, the dead ends and the decisions. A fuzzy, half-formed prompt carries more raw truth about your intent than any tidy paragraph written about it after the fact — and preserving it untouched is the foundation of trust in a coding agent.
Flattening is the opposite move. It changes nothing about what was said. In most sessions the model reads a lot — large files, long logs, multiple sources — and keeps every byte of it in context, even though it has nearly always already written down the conclusion in plain prose: the one line that mattered in a 2 MB log, the finding distilled from five files, the running tally of open tasks. The raw source has done its job. Flattening lifts those already-summarized blocks out and swaps each for a lightweight reference ID — so starting cold from a flattened session is usually smooth sailing, and on the rare occasion the raw bytes are needed, they're one retrieve_flattened call away.
What sits in the context window:
USER "fix the crash"
ASSISTANT reading the logs…
TOOL_RESULT ▓▓▓ 2 MB log dump ▓▓▓ ← bulk; already summarized in prose below
ASSISTANT "the OOM is at line 88,402 — the fix is …"
After flatten — same words, only the bulk set aside:
USER "fix the crash"
ASSISTANT reading the logs…
TOOL_RESULT [FLATTENED id=… → sidecar] ← one marker; fetch the full dump on demand
ASSISTANT "the OOM is at line 88,402 — the fix is …"
What you'll actually save
Token reduction depends entirely on what the session did:
- Read-heavy sessions (lots of large files, logs, or screenshots in context) — expect reductions up to ~50%.
- Prose-heavy sessions (little external data ingested) — savings are negligible. There's simply not much bulk to move.
- It varies a lot — often a pleasant surprise, and once in a while a touch underwhelming.
When to reach for it. A common point is around 200k tokens. For critical sessions where you want the model at its sharpest and most context-aware, flattening around 250k–300k is where the most dramatic reductions tend to show up.
Flatten smartly, the same way you wouldn't compact mid-way through a large reading task. That said, nothing is ever lost — flattening everything and then cherry-picking the few blocks you still need is a perfectly legitimate strategy.
Quick start
Requires Node.js ≥ 18 and Claude Code.
One command — installs from npm and registers it user-wide:
claude mcp add flatten -s user -- npx -y flatten-mcp@latest
Or register it manually (in ~/.claude.json, or your project's .mcp.json):
{
"mcpServers": {
"flatten": {
"command": "npx",
"args": ["-y", "flatten-mcp@latest"]
}
}
}
Recommended — install the /flatten slash command:
curl -fsSL https://raw.githubusercontent.com/shayaShav/flatten-mcp/main/commands/flatten.md -o ~/.claude/commands/flatten.md
From source (for development)
git clone https://github.com/shayaShav/flatten-mcp.git
cd flatten-mcp
npm install # builds automatically via the "prepare" script
cp commands/flatten.md ~/.claude/commands/ # optional: installs the /flatten command
Register the local build instead:
{
"mcpServers": {
"flatten": {
"command": "node",
"args": ["/absolute/path/to/flatten-mcp/dist/index.js"]
}
}
}
Configuration
By default the server operates on the project the CLI runs in (its current working directory). Pass project_dir explicitly on any call to target a different project.
| Env var | Required | Purpose |
|---|---|---|
ANTHROPIC_API_KEY | no | If set, token savings are counted exactly via Anthropic's free count_tokens endpoint instead of estimated locally. |
FLATTEN_COUNT_MODEL | no | Model id used for the exact token count (default: claude-haiku-4-5-20251001). |
Usage
CAUTION
Always exit the session you want to flatten with Ctrl-C, then flatten it from a different window. Rewriting a live session's file out from under Claude Code corrupts its in-memory state and bricks the session.
-
Exit the session you want to flatten with
Ctrl-C. This is mandatory — a 10-second live-write guard refuses to touch a recently-modified session unless you force it, but exiting is the safe path. -
In a new Claude Code window, type
/flatten latestor/flatten <session-id>— or ask:"Flatten the latest session." · or · "Flatten session
<session-id>."/flatten latest(or bare/flatten) flattens the larger of the two most recent sessions — the smaller, seconds-old one is almost always the window doing the flattening itself, and the session worth flattening is the big one. It never forces past the live-write guard. -
Resume your original session and send a prompt. When Claude starts outputting text, you'll see the token count drop.
To preview without touching anything, ask for a dry run first. To undo, ask to unflatten the session — every original block is restored to its exact original value.
TIP
Flattening needs no model intelligence — park a second window on a fast, inexpensive model (/model haiku) as a dedicated flattening station and just type /flatten latest.
Tools
| Tool | What it does |
|---|---|
flatten_session | Move bulky tool results into a sidecar, leaving [FLATTENED …] markers. Crash-safe and reversible. Supports dry_run, min_size, force, and include_tool_use_result. |
retrieve_flattened | Fetch one original block back by its id — returns the original text, or re-renders a flattened screenshot as a real image. |
unflatten_session | Reverse a flatten completely: re-inline every block from the sidecar, restoring each flattened result to its exact original value. |
prune_flatten_artifacts | Reclaim disk by deleting leftover .bak / .tmp files (and, opt-in, sidecars). Defaults to a safe dry run. |
list_sessions | List a project's sessions with branch, message count, size, and first prompt. |
search_sessions | Keyword / branch / date search across past sessions — scans prose, tool I/O, and flatten sidecars so nothing goes dark after flattening. |
When a session is flattened, the model sees compact markers like this in place of the original output:
[FLATTENED id=toolu_01AbC… tool=Read file_path=/src/server.ts | text 48213B/612L | session=2f9c… | retrieve_flattened(id,session) for raw content]
Everything the model needs to fetch the original — the id and the session — is right there in the marker.
How it works
- Sidecar, not deletion. Each extracted block is written verbatim to
<session>.flat.jsonlnext to the session. The original session file is backed up once to<session>.jsonl.bakbefore the first rewrite. - Crash-safe. Originals are persisted to the sidecar before they're removed from the session, and the session is rewritten via an atomic temp-file-and-
rename, so an interrupted run can never leave a half-written, irreplaceable session file. - Idempotent. Re-running flatten skips already-flattened blocks and never double-writes a sidecar entry.
- Lossless & reversible. Text and base64 images are stored exactly as they appeared, so
unflatten_sessionrestores each flattened block to its exact original value (byte-identical for Claude Code's canonical JSON). Your prompts and untouched lines were never altered to begin with. - Disk vs. context tokens. Claude Code stores each tool result twice on disk (once in the API message, once in a
toolUseResultmirror) and only one copy is ever sent to the model. flatten reports bothdiskBytesSaved(affects--resumeparse speed) andcontextTokensSavedout ofcontextTokensTotal(the number that actually matters for the context window and compaction) — they differ a lot, and the tool is explicit about which is which.
See docs/ARCHITECTURE.md for the session JSONL format, the sidecar schema, and the marker protocol.
Compatibility & roadmap
- Claude Code only, for now. flatten-mcp reads Claude Code's session store at
~/.claude/projects/<encoded-project-dir>/*.jsonl. It has been tested against Claude Code exclusively; the paths and the JSONL schema are specific to it and will not work for other agents or LLM CLIs as-is. - Planned — a pluggable session backend. Porting to other agents means abstracting the storage location and the on-disk message format behind a small adapter. Contributions welcome.
Contributing
Issues and PRs are welcome. To develop locally:
npm install
npm run dev # tsc --watch
npm run build # one-off compile to dist/
License
MIT © Shaya Shaviv
服务器配置
{
"mcpServers": {
"flatten": {
"command": "npx",
"args": [
"-y",
"flatten-mcp@latest"
]
}
}
}