Docs/How-To Guides/Configure Caching

>_ DOCS / HOW-TO GUIDES

CONFIGURE
CACHING.

P402's semantic cache recognises questions that mean the same thing — even when worded differently — and returns the stored answer in under 50ms at zero cost.

How Semantic Caching Works

Unlike a key-value cache that requires byte-for-byte identical input, P402 embeds every prompt using text-embedding-004 and compares it against stored embeddings using cosine similarity. If the similarity score exceeds the threshold (default 0.92), the cached response is returned immediately.

1.Request arrives → P402 embeds the prompt
2.Cosine similarity search across stored embeddings
3a.
HIT (similarity ≥ 0.92) → return cached response, $0 cost, < 50ms
3b.
MISS → route to LLM → store response + embedding

Example: semantic match

Stored: "What is the x402 payment protocol?"
Query: "Can you explain what x402 is?"
→ CACHE HIT (similarity: 0.947)

Enable Caching

Set cache: true in the p402 block. That's it — no configuration required.

bash
curl -s -X POST https://p402.io/api/v2/chat/completions \
  -H "Authorization: Bearer $P402_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is the x402 protocol?"}],
    "p402": {
      "mode": "cost",
      "cache": true
    }
  }' | jq .p402_metadata
// First request — cache miss
{
  "provider": "deepseek",
  "cost_usd": 0.0003,
  "cached": false,
  "latency_ms": 1240
}

// Second request (identical or semantically similar)
{
  "provider": "cache",
  "cost_usd": 0.0000,
  "cached": true,
  "latency_ms": 12
}

Disable Caching

Set cache: false to always hit the live LLM. Use this for time-sensitive data, personalised responses, or when freshness is required.

bash
-d '{
  "messages": [{"role": "user", "content": "What is the current ETH price?"}],
  "p402": {
    "mode": "speed",
    "cache": false    // ← always fresh
  }
}'

Monitor Cache Performance

bash
# Cache stats for your account
curl -s https://p402.io/api/v2/cache/stats \
  -H "Authorization: Bearer $P402_API_KEY" | jq .

Cache stats response

{
  "total_requests": 10420,
  "cache_hits": 3891,
  "hit_rate": 0.374,           // 37.4% of requests served from cache
  "cost_saved_usd": 1.24,      // Money saved vs. always hitting the LLM
  "avg_cache_latency_ms": 14,
  "avg_llm_latency_ms": 1380,
  "entries": 2204,             // Unique prompts stored
  "size_mb": 8.2
}
bash
# Clear your cache (all entries)
curl -s -X POST https://p402.io/api/v2/cache/clear \
  -H "Authorization: Bearer $P402_API_KEY" | jq .

Best Practices

Always enable caching on read-heavy workloads

FAQ bots, documentation assistants, and knowledge-base agents often see 40–80% hit rates. Every hit costs nothing.

Disable caching for real-time or personalised data

Stock prices, live inventory, per-user personalised content — set cache: false to guarantee fresh LLM output.

Cache persists across sessions

The cache is tenant-scoped, not session-scoped. If user A asks a question, user B's identical question is also a cache hit — free for both.

System prompts affect cache keys

Changing the system prompt produces a different embedding and a cache miss. Keep system prompts stable for maximum hit rate.