Docs/Understanding P402/Architecture

>_ DOCS / UNDERSTANDING P402

HOW P402
WORKS.

Five independent layers that work together: a routing engine, a semantic cache, a payment protocol, an intelligence layer, and on-chain settlement.

Request Lifecycle

Every API call passes through five stages in under 50ms (cache hit) or ~1–3s (cache miss).

1
Auth & Billing Guard

API key verified against SHA-256 hash. Six billing guard layers evaluated in order (rate limit → daily cap → concurrency → anomaly → per-request cap → session budget reservation).

< 2ms
2
Semantic Cache Lookup

Prompt embedded with text-embedding-004. Cosine similarity search against stored embeddings. Hit → return cached response immediately. Miss → continue to routing.

< 10ms
3
Routing Engine

Scoring function selects the optimal provider based on mode (cost / quality / speed / balanced), live health status, historical latency, and ELO benchmark score.

< 5ms
4
LLM Call

Request forwarded to selected provider's API. Response streamed back. P402 normalises the response into OpenAI-compatible format.

1–3s (provider)
5
Post-processing

Cost recorded. Cache entry stored. Session budget debited. Intelligence layer logs for async analysis. p402_metadata appended to response.

< 5ms

Routing Engine

The routing engine scores every available provider on each request and selects the winner. Provider health is continuously monitored via background probes.

cost
100% cost score ($/token)

Selects the cheapest provider capable of handling the request. Often DeepSeek or Groq Llama for simple tasks.

quality
100% quality score (ELO)

Selects the provider with the highest benchmark score. Typically GPT-4o or Claude Opus for complex tasks.

speed
100% speed score (p95 latency)

Selects the provider with the lowest measured p95 latency. Typically Groq (300+ tok/s) for real-time UX.

balanced
33% cost + 33% quality + 33% speed

Equal weight. Good starting point for general-purpose agents before you know your constraint.

13 providers are registered at launch: OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Cohere, Together, Fireworks, Perplexity, AI21, and OpenRouter (as a meta-provider covering 200+ additional models).

Semantic Cache

Most production AI systems answer the same questions repeatedly. The semantic cache stores responses indexed by embedding, not by exact string match.

Embedding modelGoogle text-embedding-004 (768 dimensions)
Similarity metricCosine similarity
Default threshold0.92 (configurable per request)
StorageRedis — embeddings + response bodies
ScopeTenant-scoped (not per-session or per-user)
Cache hit latency< 50ms (typically 10–20ms)
Cache hit cost$0.00

Why 0.92?

At 0.92 similarity, questions like "What is x402?" and "Can you explain the x402 protocol?" are matched. At 0.85, unrelated questions start to collide. At 0.98, only near-identical phrasings match. 0.92 is the empirically optimal threshold for developer documentation and FAQ-style queries.

Payment Layer (x402)

P402 implements the x402 payment protocol — a machine-native payment standard where HTTP 402 "Payment Required" becomes a first-class response with a signed authorization that settles on-chain.

1.Client signs an EIP-3009 TransferWithAuthorization (off-chain, gasless)
2.Signed payload submitted to P402 Facilitator
3.Facilitator verifies signature, amount, expiry, and nonce
4.Facilitator executes transferWithAuthorization on USDC contract (pays gas)
5.Settlement confirmed on Base in ~2 seconds
NetworkBase Mainnet (Chain ID: 8453, CAIP-2: eip155:8453)
TokenUSDC — 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
Treasury0xFa772434DCe6ED78831EbC9eeAcbDF42E2A031a6
Settlement contract0xd03c7ab9a84d86dbc171367168317d6ebe408601 (P402Settlement.sol)
GasPaid by P402 Facilitator (never the user or agent)
Replay protectionEIP-3009 nonce tracked in PostgreSQL + Redis

Intelligence Layer

Two Gemini models run continuously in the background to protect against cost anomalies and to optimise routing decisions over time.

Sentinel (Gemini Flash)
Real-time Anomaly Detection

Monitors spend velocity every 60 seconds. If spend spikes 10× above the 7-day baseline, Sentinel pauses the tenant's traffic and sends an alert. Designed to catch prompt-injection billing attacks before they cause damage.

Trigger
Automatic — no configuration required
Economist (Gemini Pro)
Protocol Optimisation

Analyses routing decisions asynchronously. Identifies patterns — e.g. 90% of quality-mode requests could be served by a cheaper model with equivalent accuracy for that tenant's workload — and surfaces recommendations in Dashboard → Intelligence.

Trigger
Async — results in Dashboard

Agent Identity (ERC-8004)

P402 supports ERC-8004 Trustless Agent Identity — an on-chain registry where agents have cryptographic DIDs, on-chain reputation scores, and verifiable spending histories.

Identity Registry0x8004A169FB4a3325136EB29fA0ceB6D2e539a432 (Base)
Reputation Registry0x8004BAa17C55a88189AE136b182e5fdA19dE9b63 (Base)
DID formatdid:p402:agent:{agentId}
ReputationScore from 0–100, updated after each task