>_ DOCS / HOW-TO GUIDES
CONFIGURE
CACHING.
P402's semantic cache recognises questions that mean the same thing — even when worded differently — and returns the stored answer in under 50ms at zero cost.
How Semantic Caching Works
Unlike a key-value cache that requires byte-for-byte identical input, P402 embeds every prompt using text-embedding-004 and compares it against stored embeddings using cosine similarity. If the similarity score exceeds the threshold (default 0.92), the cached response is returned immediately.
Example: semantic match
Enable Caching
Set cache: true in the p402 block. That's it — no configuration required.
curl -s -X POST https://p402.io/api/v2/chat/completions \
-H "Authorization: Bearer $P402_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is the x402 protocol?"}],
"p402": {
"mode": "cost",
"cache": true
}
}' | jq .p402_metadata// First request — cache miss
{
"provider": "deepseek",
"cost_usd": 0.0003,
"cached": false,
"latency_ms": 1240
}
// Second request (identical or semantically similar)
{
"provider": "cache",
"cost_usd": 0.0000,
"cached": true,
"latency_ms": 12
}Disable Caching
Set cache: false to always hit the live LLM. Use this for time-sensitive data, personalised responses, or when freshness is required.
-d '{
"messages": [{"role": "user", "content": "What is the current ETH price?"}],
"p402": {
"mode": "speed",
"cache": false // ← always fresh
}
}'Monitor Cache Performance
# Cache stats for your account
curl -s https://p402.io/api/v2/cache/stats \
-H "Authorization: Bearer $P402_API_KEY" | jq .Cache stats response
{
"total_requests": 10420,
"cache_hits": 3891,
"hit_rate": 0.374, // 37.4% of requests served from cache
"cost_saved_usd": 1.24, // Money saved vs. always hitting the LLM
"avg_cache_latency_ms": 14,
"avg_llm_latency_ms": 1380,
"entries": 2204, // Unique prompts stored
"size_mb": 8.2
}# Clear your cache (all entries)
curl -s -X POST https://p402.io/api/v2/cache/clear \
-H "Authorization: Bearer $P402_API_KEY" | jq .Best Practices
FAQ bots, documentation assistants, and knowledge-base agents often see 40–80% hit rates. Every hit costs nothing.
Stock prices, live inventory, per-user personalised content — set cache: false to guarantee fresh LLM output.
The cache is tenant-scoped, not session-scoped. If user A asks a question, user B's identical question is also a cache hit — free for both.
Changing the system prompt produces a different embedding and a cache miss. Keep system prompts stable for maximum hit rate.