>_ DOCS / UNDERSTANDING P402

HOW P402
WORKS.

Five independent layers that work together: a routing engine, a semantic cache, a payment protocol, an intelligence layer, and on-chain settlement.

Request Lifecycle

Every API call passes through five stages in under 50ms (cache hit) or ~1–3s (cache miss).

Auth & Billing Guard

API key verified against SHA-256 hash. Six billing guard layers evaluated in order (rate limit → daily cap → concurrency → anomaly → per-request cap → session budget reservation).

< 2ms

Semantic Cache Lookup

Prompt embedded with text-embedding-004. Cosine similarity search against stored embeddings. Hit → return cached response immediately. Miss → continue to routing.

< 10ms

Routing Engine

Scoring function selects the optimal provider based on mode (cost / quality / speed / balanced), live health status, historical latency, and ELO benchmark score.

< 5ms

LLM Call

Request forwarded to selected provider's API. Response streamed back. P402 normalises the response into OpenAI-compatible format.

1–3s (provider)

Post-processing

Cost recorded. Cache entry stored. Session budget debited. Intelligence layer logs for async analysis. p402_metadata appended to response.

< 5ms

Routing Engine

The routing engine scores every available provider on each request and selects the winner. Provider health is continuously monitored via background probes.

cost

100% cost score ($/token)

Selects the cheapest provider capable of handling the request. Often DeepSeek or Groq Llama for simple tasks.

quality

100% quality score (ELO)

Selects the provider with the highest benchmark score. Typically GPT-4o or Claude Opus for complex tasks.

speed

100% speed score (p95 latency)

Selects the provider with the lowest measured p95 latency. Typically Groq (300+ tok/s) for real-time UX.

balanced

33% cost + 33% quality + 33% speed

Equal weight. Good starting point for general-purpose agents before you know your constraint.

13 providers are registered at launch: OpenAI, Anthropic, Google, Groq, DeepSeek, Mistral, Cohere, Together, Fireworks, Perplexity, AI21, and OpenRouter (as a meta-provider covering 200+ additional models).

Semantic Cache

Most production AI systems answer the same questions repeatedly. The semantic cache stores responses indexed by embedding, not by exact string match.

Embedding modelGoogle text-embedding-004 (768 dimensions)

Similarity metricCosine similarity

Default threshold0.92 (configurable per request)

StorageRedis — embeddings + response bodies

ScopeTenant-scoped (not per-session or per-user)

Cache hit latency< 50ms (typically 10–20ms)

Cache hit cost$0.00

Why 0.92?

At 0.92 similarity, questions like "What is x402?" and "Can you explain the x402 protocol?" are matched. At 0.85, unrelated questions start to collide. At 0.98, only near-identical phrasings match. 0.92 is the empirically optimal threshold for developer documentation and FAQ-style queries.

Payment Layer (x402)

P402 implements the x402 payment protocol — a machine-native payment standard where HTTP 402 "Payment Required" becomes a first-class response with a signed authorization that settles on-chain.

1.Client signs an EIP-3009 TransferWithAuthorization (off-chain, gasless)

2.Signed payload submitted to P402 Facilitator

3.Facilitator verifies signature, amount, expiry, and nonce

4.Facilitator executes transferWithAuthorization on USDC contract (pays gas)

5.Settlement confirmed on Base in ~2 seconds

NetworkBase Mainnet (Chain ID: 8453, CAIP-2: eip155:8453)

TokenUSDC — 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913

Treasury0xFa772434DCe6ED78831EbC9eeAcbDF42E2A031a6

Settlement contract0xd03c7ab9a84d86dbc171367168317d6ebe408601 (P402Settlement.sol)

GasPaid by P402 Facilitator (never the user or agent)

Replay protectionEIP-3009 nonce tracked in PostgreSQL + Redis

Intelligence Layer

Two Gemini models run continuously in the background to protect against cost anomalies and to optimise routing decisions over time.

Sentinel (Gemini Flash)

Real-time Anomaly Detection

Monitors spend velocity every 60 seconds. If spend spikes 10× above the 7-day baseline, Sentinel pauses the tenant's traffic and sends an alert. Designed to catch prompt-injection billing attacks before they cause damage.

Trigger

Automatic — no configuration required

Economist (Gemini Pro)

Protocol Optimisation

Analyses routing decisions asynchronously. Identifies patterns — e.g. 90% of quality-mode requests could be served by a cheaper model with equivalent accuracy for that tenant's workload — and surfaces recommendations in Dashboard → Intelligence.

Trigger

Async — results in Dashboard

Agent Identity (ERC-8004)

P402 supports ERC-8004 Trustless Agent Identity — an on-chain registry where agents have cryptographic DIDs, on-chain reputation scores, and verifiable spending histories.

Identity Registry0x8004A169FB4a3325136EB29fA0ceB6D2e539a432 (Base)

Reputation Registry0x8004BAa17C55a88189AE136b182e5fdA19dE9b63 (Base)

DID formatdid:p402:agent:{agentId}

ReputationScore from 0–100, updated after each task

Read the ERC-8004 documentation →

Go deeper

HOW P402WORKS.

Request Lifecycle

Routing Engine

Semantic Cache

Payment Layer (x402)

Intelligence Layer

Agent Identity (ERC-8004)

HOW P402
WORKS.