When auto-compaction triggers mid-task, the summarization process irreversibly loses user-provided data that Claude is actively working with. The compacted summary retains a reference to the data (e.g., "user provided DOM markup") but discards the data itself — even though the full transcript is still on disk at ~/.claude/projects/{project-path}/.
Concrete example: I paste ~8,000 characters of DOM markup and ask Claude to refactor it. The session continues, auto-compaction fires, and the summary now says "User provided 8,200 characters of DOM markup for analysis." When I ask a specific question about the markup, Claude either hallucinates details, hedges, or asks me to re-paste what I provided minutes ago. The transcript on disk still has the original markup — Claude just has no way to know that, or where to find it.
This isn't an edge case. Any task involving substantial user-provided input (DOM markup, config files, log output, schema definitions) is at risk the moment the context window fills up. There are at least 8 open issues describing different symptoms of this same root cause:
#1534 — Memory Loss After Auto-compact (June 2025, closed — problem persists)
#3021 — Forgets to Refresh Memory After Compaction
#10960 — Repository Path Changes Forgotten After Compaction
#13919 — Skills context completely lost after auto-compaction (compiled its own list of 5+ related issues)
#14968 — Context compaction loses critical state
#19888 — Conversation compaction loses entire history
#21105 — Context truncation causes loss of conversation history
#23620 — Agent team lost when lead's context gets compacted
The underlying problem: compaction is a one-way lossy transformation with an available lossless source that isn't wired up. The summary and the transcript coexist on disk, but the summary contains no pointer back to the source material it compressed.
Phase 1: Within-session indexed transcript references
Tag compressed entries in the compacted summary with transcript line-range annotations so Claude can surgically recover only the specific content it needs, on demand.
Current behavior (lossy, no recovery):
[compacted] User provided 8,200 characters of DOM markup for analysis.
Proposed behavior (recoverable):
[compacted] User provided 8,200 characters of DOM markup for analysis. [transcript:lines 847-1023]
When Claude detects it needs the original data — because a question requires specifics the summary doesn't contain — it reads only lines 847–1023 from the transcript. Surgical recovery, minimal token cost, no wasted context on data that isn't needed.
Why this works:
Implementation scope: A modification to the compaction summarizer — when compressing a block of content, emit the summary line plus a transcript line-range annotation. Then expose a recovery mechanism (likely a file read with offset) that Claude can invoke when it identifies a gap between what the summary references and what it can actually see.
Phase 2 (future extension): Cross-session indexed lookup
The same mechanism generalizes across sessions. Transcripts from past sessions persist in ~/.claude/projects/. A cross-session reference would look like:
[compacted] User established auth pattern using JWT with refresh tokens.
[session:2025-02-15/transcript:lines 312-487]
Suggested UX: Claude checks the current session transcript first (fast, local, cheap). If no match, Claude prompts the user: "I don't have that detail in the current session. Want me to search previous session transcripts?" User opts in → Claude searches across session transcripts using indexed references. This respects the user's token budget while providing a clean escalation path.
Workarounds I've tried:
MEMORY.md behavioral nudge — Adding a rule in ~/.claude/CLAUDE.md telling Claude to check the transcript when data feels incomplete. This works occasionally but depends on Claude being self-aware enough to recognize what it's lost, which it often can't. It's a behavioral hack, not an architectural fix.
Manual /compact with preservation instructions — e.g., /compact preserve all user-provided DOM markup. This is manual, not automated, and still produces lossy summaries.
Existing proposals that address this differently:
#17428 — File-backed summaries that write compacted content to new files for later retrieval. This proposal achieves the same goal with less machinery: the transcript already exists on disk, so rather than duplicating data into new files, annotate the summary with line-range pointers to the existing transcript. Same recoverability, zero additional storage.
#15923 — Pre-compaction hook to archive transcripts before compaction. Useful for logging but doesn't solve the recovery problem — Claude still can't find what it lost.
#21190 — Include last N interactions verbatim in the compact summary. Helps with recent context but doesn't scale to mid-session data and increases summary size.
#6549 — Manual continuation prompts before compaction. Effective but entirely manual and user-dependent.
Community workarounds: The MCP memory ecosystem (embedding stores, knowledge graphs, SQLite-backed recall) has dozens of projects working around this gap externally. These are all compensating for one missing capability: the compacted summary doesn't know where its source material lives. The indexed transcript reference approach solves it natively, at the platform layer, with zero additional infrastructure.
High - Significant impact on productivity
Performance and speed
With indexed transcript references:
At step 4, Claude sees [transcript:lines 847-1023] in the compacted summary, reads just those 177 lines from the transcript on disk, recovers the exact markup, and answers accurately. No re-paste, no guessing, no wasted tokens. The session continues as if compaction never happened.
Technical considerations:
The transcript already persists at ~/.claude/projects// — no new storage mechanism needed. This proposal adds metadata to the compaction summary, not infrastructure.
Implementation scope is a modification to the compaction summarizer: emit summary lines with [transcript:lines X-Y] annotations, plus a recovery mechanism (file read with line offset) Claude can invoke on demand.
Token cost is zero until recovery is triggered, and surgical when it is — only the specific compressed chunk gets loaded, not the full transcript.
Related issues (demand evidence):
At least 8 open bugs trace back to this same root cause — compaction summaries with no back-reference to source:
#1534, #3021, #10960, #13919, #14968, #19888, #21105, #23620
Related feature requests (differentiation):
Phase 2 extension (not the core ask, but worth noting): The same indexed mechanism generalizes across session transcripts for cross-session recovery. Default to current session lookup; prompt the user before searching prior sessions. Same architecture, wider scope.