>_ DOCS / TUTORIAL
BUILD A
BUDGET AGENT.
An AI agent that tracks its own spending, enforces hard budget caps, and automatically uses cached responses to stretch every dollar.
What you'll build
- ✓A Python agent with a $5 hard spending cap
- ✓Automatic cost-optimised routing on every request
- ✓Semantic cache that makes repeated queries free
- ✓Budget exhaustion handling with graceful shutdown
- ✓Real-time spend tracking via session stats
Prerequisites
- 1.A P402 API key — create one free
- 2.Python 3.9+ with pip install openai requests
Create a Session
A session is a budget-capped container. Every LLM call made with a session's ID is charged against its budget. When the budget is exhausted, the session rejects further requests — no surprise bills.
import os, requests
P402_API_KEY = os.environ["P402_API_KEY"]
def create_session(budget_usd: float) -> str:
resp = requests.post(
"https://p402.io/api/v2/sessions",
headers={"Authorization": f"Bearer {P402_API_KEY}"},
json={"budget_usd": budget_usd},
timeout=10,
)
resp.raise_for_status()
data = resp.json()
print(f"Session {data['id']} — budget ${data['budget_usd']:.2f}")
return data["id"]
SESSION_ID = create_session(5.00) # Hard cap: $5Wire Up the Agent
P402 is OpenAI-compatible. Replace the base URL and pass your session ID in the extra_body. No other SDK changes needed.
from openai import OpenAI
client = OpenAI(
api_key=P402_API_KEY,
base_url="https://p402.io/api/v2",
)
def ask(question: str, session_id: str) -> str:
"""Send a question and return the answer text."""
response = client.chat.completions.create(
model="auto", # P402 picks the cheapest model that answers well
messages=[{"role": "user", "content": question}],
extra_body={
"p402": {
"session_id": session_id,
"mode": "cost", # Optimise for lowest cost
"cache": True, # Return cached answer if identical query seen before
}
},
)
# P402 metadata is attached to every response
meta = getattr(response, "p402_metadata", {})
provider = meta.get("provider", "unknown")
cost = meta.get("cost_usd", 0)
cached = meta.get("cached", False)
label = "CACHED (free)" if cached else f"${cost:.4f} via {provider}"
print(f" [{label}]")
return response.choices[0].message.content or ""Track Spend in Real Time
Poll the session stats endpoint before each request. If you're within 10% of the cap, warn the user. At 100%, exit cleanly.
def get_session_stats(session_id: str) -> dict:
resp = requests.get(
f"https://p402.io/api/v2/sessions/{session_id}/stats",
headers={"Authorization": f"Bearer {P402_API_KEY}"},
timeout=5,
)
resp.raise_for_status()
return resp.json()
def budget_remaining(session_id: str) -> float:
stats = get_session_stats(session_id)
spent = stats.get("budget_spent_usd", 0)
budget = stats.get("budget_usd", 0)
return budget - spentHandle Budget Exhaustion
When the session is exhausted, P402 returns HTTP 402 with error code SESSION_BUDGET_EXCEEDED. Catch it and gracefully shut the agent down or provision a new session.
import openai
def safe_ask(question: str, session_id: str) -> str | None:
remaining = budget_remaining(session_id)
if remaining <= 0:
print("Budget exhausted. Shutting down.")
return None
if remaining < 0.50:
print(f"Warning: only ${remaining:.2f} remaining.")
try:
return ask(question, session_id)
except openai.BadRequestError as e:
if "SESSION_BUDGET_EXCEEDED" in str(e):
print("Session budget exhausted mid-run.")
return None
raiseRun the Agent
Put it together. The agent processes a queue of questions, tracks spend, and stops when the budget is gone.
QUESTIONS = [
"What is the x402 payment protocol?",
"Explain EIP-3009 transferWithAuthorization.",
"What is the difference between cost and quality routing?",
"What is the x402 payment protocol?", # ← identical — will be served from cache
"How does semantic caching work?",
]
def main():
session_id = create_session(5.00)
print(f"\nStarting agent with $5.00 budget\n{'─'*45}")
for i, question in enumerate(QUESTIONS, 1):
print(f"\nQ{i}: {question[:60]}...")
answer = safe_ask(question, session_id)
if answer is None:
break
print(f"A: {answer[:120]}...")
stats = get_session_stats(session_id)
print(f"\n{'─'*45}")
print(f"Total spent: ${stats['budget_spent_usd']:.4f}")
print(f"Requests: {stats['request_count']}")
print(f"Cache hits: {stats.get('cache_hits', 0)}")
if __name__ == "__main__":
main()Expected output
Starting agent with $5.00 budget
─────────────────────────────────────────────
Q1: What is the x402 payment protocol?...
[$0.0003 via deepseek]
A: x402 is a machine-native payment standard...
Q2: Explain EIP-3009 transferWithAuthorization....
[$0.0004 via deepseek]
A: EIP-3009 defines a way for token holders to...
Q3: What is the difference between cost and quality...
[$0.0002 via deepseek]
Q4: What is the x402 payment protocol?...
[CACHED (free)] ← identical query, zero cost
Q5: How does semantic caching work?...
[$0.0003 via deepseek]
─────────────────────────────────────────────
Total spent: $0.0012
Requests: 5
Cache hits: 1TypeScript Variant
Same pattern, zero extra dependencies beyond the official OpenAI SDK.
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.P402_API_KEY,
baseURL: 'https://p402.io/api/v2',
});
// Create session
const session = await fetch('https://p402.io/api/v2/sessions', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.P402_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ budget_usd: 5 }),
}).then((r) => r.json());
// Ask with budget cap
const response = await client.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Explain EIP-3009.' }],
// @ts-expect-error — P402 extension field
p402: { session_id: session.id, mode: 'cost', cache: true },
});
const meta = (response as Record<string, unknown>).p402_metadata as {
cost_usd: number;
cached: boolean;
provider: string;
} | undefined;
console.log(`Cost: $${meta?.cost_usd ?? 0} via ${meta?.provider}`);
console.log(response.choices[0]?.message.content);What's next
You have a working budget agent. Here's how to go deeper: