Caching

Synapse has three independent caches that together cut LLM cost and tool latency without changing what your agents do. Most of it runs automatically — you only need to know where to look in the UI to see savings, and which knob to flip when you want stronger caching on a specific orchestration step.

Cache	Scope	Default	Where to view
Prompt cache	Provider-level (Anthropic, OpenAI, DeepSeek, Gemini, Bedrock)	On	Settings → Usage
Response cache	Per orchestration step (exact + semantic)	Off (opt-in per step)	Settings → Usage
Tool cache	Deterministic tools (code_search, pdf_parser, …)	On for eligible tools	Settings → Usage

All three persist to disk under DATA_DIR/cache/. Entries have a 1-hour TTL by default.

Prompt cache

The prompt cache piggybacks on each LLM provider's native caching mechanism. Synapse injects cache_control markers (Anthropic, Bedrock) or relies on automatic prefix caching (OpenAI, DeepSeek, Gemini) so that the long, stable parts of your system prompt — agent instructions, tool definitions, RAG context — are reused across turns instead of being re-tokenised every call.

What gets cached: the stable prefix of the system prompt plus the tools block. Synapse splits the system prompt at an internal volatile separator so turn-changing values (current time, turn budget, recent RAG matches) sit after the cache point — they update freely without invalidating the cached prefix.

Minimum prompt length: about 4000 characters (~1000 tokens). Below that, providers ignore the cache marker, so Synapse skips it to avoid paying the write surcharge for ineligible writes.

Cost shape:

Provider	Read cost	Write cost
Anthropic	~0.1× base input	~1.25× base input (first call only)
OpenAI	~0.5× base input	No extra charge — automatic prefix caching
DeepSeek	~0.1× base input	No extra charge
Gemini	~0.25× base input	Requires a 5-min minimum TTL
Bedrock (Claude)	~0.1× base input	~1.25× base input

In practice this lands at 50–80% cost reduction on long, repeated conversations, after a one-time write surcharge on the first turn.

Viewing cache savings

Go to Settings → Usage. The cache panel surfaces:

Total Estimated Savings — dollar amount Synapse would have paid without the cache.
Response Cache Hit Rate — fraction of LLM calls served from the response cache (see below).
Total Cache Read Tokens / Total Cache Write Tokens — raw read/write totals.
By Model — per-model breakdown so you can see which models are actually using the cache.
By Run — per-orchestration-run breakdown.

Disabling the prompt cache

There's no UI toggle — the cache is on by default and almost always worth it. If you need to disable it (e.g. for cost auditing or A/B comparison), edit settings.json directly:

Advanced: direct settings.json edit

{ "prompt_cache_enabled": false }

Response cache

The response cache short-circuits an LLM call entirely when an identical (or near-identical) request has been made before. There are two layers:

Exact match — SHA256 of (model, system_prompt, messages, tools). O(1) lookup. If the hash matches, the cached completion is returned without contacting the provider.

Semantic match — embeds the last user message and compares it against prior cached entries for the same step. A high similarity threshold (0.95) keeps hits limited to near-identical prompts. Requires ChromaDB to be available (it is, by default).

Important: opt-in per step

The response cache is off by default and only available on certain orchestration step types:

Step type	Eligible?	Why
LLM	✅	Pure prompt-in / response-out, safe to cache
Evaluator	✅	Routing decision is deterministic for the same state
Extract JSON	✅	Pure parsing of a previous step's output
Agent	❌	Skipping the agent would also skip its tool-call side effects, which would silently desync your shared state

To enable response caching on an eligible step, open the orchestration editor, click the step, and toggle the Cache responses option in the Step Config panel.

Viewing response cache activity

Settings → Usage shows total_response_cache_hits and response_cache_hit_rate. Per-model and per-run breakdowns include cache hits so you can tell which steps are benefiting.

Tool cache

Synapse memoizes the result of tool calls that are pure functions of their arguments — running the same tool with the same args twice should not pay twice.

Cacheable tools (always on):

Tool	Scope	Reason
`code_search`	global	Same query against the same index returns the same chunks
`pdf_parser`	global	Same PDF parses identically
`xlsx_parser`	global	Same file → same rows
`time`	global	`parse_time("tomorrow 5pm")` is deterministic
`code_indexer`	global	Indexing a directory twice is idempotent
`collect_data`	global	Form schema is static
`personal_details`	session	Per-user lookup — keyed by session id

bash, sql_query, web_scraper, browser_*, and sandbox_execute are deliberately not cached: their output reflects live external state, and a cached result would mask reality.

Invalidating the tool cache

The tool cache is invalidated automatically on TTL expiry. If you re-index a code repo, Synapse clears the relevant code_search cache entries so subsequent searches see the new chunks. There is no manual UI button — TTL plus the auto-invalidation hooks are sufficient in practice.

On-disk layout

All three caches live under DATA_DIR/cache/, organised by namespace:

DATA_DIR/cache/
├── responses_exact/         # Exact-match LLM responses
├── responses_semantic_*/    # Semantic cache per step (ChromaDB collection)
└── tool_results/            # Memoized tool results

You can clear a cache entirely by deleting its namespace directory and restarting the backend. Settings → Usage also shows per-namespace disk usage under disk_stats.

When caching does NOT help

Streaming tool-heavy chats where the user message changes every turn — exact match won't hit, and the prompt cache only helps on the stable system prefix.
Very short system prompts (under ~1000 tokens) — below the provider minimum, so prompt cache is skipped.
Agent steps in orchestration — these intentionally bypass the response cache.

If your hit rate is unexpectedly low, check Settings → Usage → By Model: if cache read tokens are zero for a model you're using heavily, your system prompt is likely too short, or the model's provider doesn't support cache markers in the way Synapse expects (file a bug if so).

Prompt cache​

Viewing cache savings​

Disabling the prompt cache​

Response cache​

Important: opt-in per step​

Viewing response cache activity​

Tool cache​

Invalidating the tool cache​

On-disk layout​

When caching does NOT help​