>_ DOCS / UNDERSTANDING P402
HOW P402
WORKS.
Five independent layers that work together: a routing engine, a semantic cache, a payment protocol, an intelligence layer, and on-chain settlement.
Request Lifecycle
Every API call passes through five stages in under 50ms (cache hit) or ~1–3s (cache miss).
API key verified against SHA-256 hash. Six billing guard layers evaluated in order (rate limit → daily cap → concurrency → anomaly → per-request cap → session budget reservation).
Prompt embedded with text-embedding-004. Cosine similarity search against stored embeddings. Hit → return cached response immediately. Miss → continue to routing.
Scoring function selects the optimal provider based on mode (cost / quality / speed / balanced), live health status, historical latency, and ELO benchmark score.
Request forwarded to selected provider's API. Response streamed back. P402 normalises the response into OpenAI-compatible format.
Cost recorded. Cache entry stored. Session budget debited. Intelligence layer logs for async analysis. p402_metadata appended to response.
Routing Engine
The routing engine scores every available provider on each request and selects the winner. Provider health is continuously monitored via background probes.
Selects the cheapest provider capable of handling the request. Often DeepSeek or Groq Llama for simple tasks.
Selects the provider with the highest benchmark score. Typically GPT-4o or Claude Opus for complex tasks.
Selects the provider with the lowest measured p95 latency. Typically Groq (300+ tok/s) for real-time UX.
Equal weight. Good starting point for general-purpose agents before you know your constraint.
13 providers are registered at launch: OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Cohere, Together, Fireworks, Perplexity, AI21, and OpenRouter (as a meta-provider covering 200+ additional models).
Semantic Cache
Most production AI systems answer the same questions repeatedly. The semantic cache stores responses indexed by embedding, not by exact string match.
Why 0.92?
At 0.92 similarity, questions like "What is x402?" and "Can you explain the x402 protocol?" are matched. At 0.85, unrelated questions start to collide. At 0.98, only near-identical phrasings match. 0.92 is the empirically optimal threshold for developer documentation and FAQ-style queries.
Payment Layer (x402)
P402 implements the x402 payment protocol — a machine-native payment standard where HTTP 402 "Payment Required" becomes a first-class response with a signed authorization that settles on-chain.
Intelligence Layer
Two Gemini models run continuously in the background to protect against cost anomalies and to optimise routing decisions over time.
Monitors spend velocity every 60 seconds. If spend spikes 10× above the 7-day baseline, Sentinel pauses the tenant's traffic and sends an alert. Designed to catch prompt-injection billing attacks before they cause damage.
Analyses routing decisions asynchronously. Identifies patterns — e.g. 90% of quality-mode requests could be served by a cheaper model with equivalent accuracy for that tenant's workload — and surfaces recommendations in Dashboard → Intelligence.
Agent Identity (ERC-8004)
P402 supports ERC-8004 Trustless Agent Identity — an on-chain registry where agents have cryptographic DIDs, on-chain reputation scores, and verifiable spending histories.