MCP Plugin · Open Source · MIT
Built with Claude, for Claude.
Automatic prompt caching for Claude Code. Detects stable content — system prompts, tool definitions, file reads — and marks them so you pay 90% less on every repeated turn.
Claude Code
# Two commands. That's it.
/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis
# No npm. No config file. No restart.
# Savings start on the next turn.
90%
average token
cost reduction
90% token cost reduction
4 session modes
0 config required
MIT open source
🐛
Detects stack traces in your messages. Caches the buggy file + error context once. Every follow-up only pays for the new question.
stack trace detected
♻️
Detects refactor keywords + file lists. Caches the before-pattern, style guides, and type definitions. Only per-file instructions re-sent.
refactor keywords + files
📂
Tracks read counts per file. On the second read, injects a cache breakpoint. All future reads cost 0.1× instead of 1×.
always on — all modes
🧊
After N turns, freezes all messages before turn (N−3) as a cached prefix. Only the last 3 turns sent fresh. Savings compound.
turn count threshold
| Session type | Turns | Without caching | With caching | Savings |
|---|---|---|---|---|
| Bug fix (single file) | 20 | 184,000 tokens | 28,400 tokens | 85% |
| Refactor (5 files) | 15 | 310,000 tokens | 61,200 tokens | 80% |
| General coding | 40 | 890,000 tokens | 71,200 tokens | 92% |
| Repeated file reads (5×5) | — | 50,000 tokens | 5,100 tokens | 90% |
Cache creation costs 1.25× normal. Cache reads cost 0.1×. Every turn after the first is pure savings.
Claude Code recommended
⏳ Pending approval in the official Claude Code plugin marketplace. Install directly from GitHub in the meantime:
Run these inside Claude Code — no npm, no config file, no restart needed:
/plugin marketplace add https://github.com/flightlesstux/prompt-caching
/plugin install prompt-caching@ercan-ermis
Claude Code's plugin system handles everything automatically. The get_cache_stats tool is available immediately after install.
Cursor · Windsurf · ChatGPT · Perplexity · Zed · Continue.dev · any MCP client
npm install -g prompt-caching-mcp
Cursor → .cursor/mcp.json · Windsurf → MCP settings · Any MCP client → stdio
{
"mcpServers": {
"prompt-caching-mcp": {
"command": "prompt-caching-mcp"
}
}
}
You're right, and it's a fair question. Claude Code does handle prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. You don't need this plugin for that.
This plugin is for a different layer: when you build your own apps or agents with the Anthropic SDK. Raw SDK calls don't get automatic caching unless you place cache_control breakpoints yourself. This plugin does that automatically, plus gives you visibility into what's being cached, hit rates, and real savings — which Claude Code doesn't expose.
Short version: Claude Code using the API → already cached ✅ Your app calling the API → this plugin handles it ✅
Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.
What it still adds: observability. get_cache_stats tracks hit rate and cumulative savings across turns. analyze_cacheability lets you dry-run a prompt to see exactly what would be cached and where breakpoints land — useful when debugging why you're getting cache misses. If you just want set-and-forget, Anthropic's auto-caching is enough.
If you're seeing high cache reads in Claude Code, that's the built-in caching working perfectly — this plugin won't improve those numbers further. It can't intercept Claude Code's own API calls.
It would help you if you're also running your own scripts or agents via the Anthropic SDK alongside Claude Code. Those calls are separate and don't benefit from Claude Code's automatic caching.
Claude Opus 4.6/4.5/4.1/4, Sonnet 4.6/4.5/4/3.7, Haiku 4.5, Haiku 3.5, and Haiku 3. Minimum cacheable prompt size varies by model (1024–4096 tokens). See the official pricing table for full details.
5 minutes by default (ephemeral), extended for free on every cache hit. A 1-hour TTL is also available at 2× the base input token price — useful for long-running agents or infrequent sessions. See Anthropic's prompt caching docs for the full breakdown.
Open source · MIT · Zero lock-in