One page for humans and agents.
Start with a session. Pick or let Cerver recommend compute. Run work. Read the transcript, metrics, and cost. Agents can follow the same steps without inventing integration glue.
Copy this first.
This is the smallest useful Cerver flow: create a session, run something, read the cost/latency view, then close it. Use this shape before you reach for provider-specific APIs.
Need a CERVER_API_TOKEN?
Start free →
Once you're in, the dashboard generates a token you can copy and export as CERVER_API_TOKEN.
curl -X POST https://gateway.cerver.ai/v2/sessions \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"session_name": "hello-cerver",
"compute": { "provider": "vercel" },
"harness": "openai"
}'
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "code": "echo hello from cerver" }'
curl https://gateway.cerver.ai/v2/sessions/SESSION_ID/metrics \
-H "Authorization: Bearer $CERVER_API_TOKEN"
curl -X DELETE https://gateway.cerver.ai/v2/sessions/SESSION_ID \
-H "Authorization: Bearer $CERVER_API_TOKEN"
Replace SESSION_ID with the id returned by the first call. Use compute: null for transcript-only sessions.
If you are an agent, follow these rules.
Cerver docs are meant to be executable context. Do not infer a new API shape when one below already fits.
Default agent behavior
- Use
GET /v2/sessionsto find prior work before creating a duplicate session. - Create with
POST /v2/sessions. Pickcompute,harness, andsession_nameexplicitly. - Use
POST /v2/sessions/:id/runfor code and shell work. - Use
POST /v2/sessions/:id/run-llmfor model turns. - Use
POST /v2/sessions/:id/computewhen you need to swap compute without losing the transcript. - Use
GET /v2/sessions/:id/metricsbefore reporting cost, latency, or savings.
# Find recent sessions curl https://gateway.cerver.ai/v2/sessions?limit=20 \ -H "Authorization: Bearer $CERVER_API_TOKEN" # Peek one session curl "https://gateway.cerver.ai/v2/sessions/SESSION_ID?tail=50" \ -H "Authorization: Bearer $CERVER_API_TOKEN"
Same session. Switch the model, compute, and tools underneath.
A Cerver session is not only a transcript. It is the control layer for a run. Your app keeps one session id while Cerver can change the intelligence layer, the compute runtime, and the tools attached to the work.
# Same intent, different model choices.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run-llm \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "model": "claude-sonnet-4-5", "input": "Review this migration" }'
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run-llm \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "model": "gpt-5", "input": "Review this migration" }'
# Same session, different compute underneath.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/compute \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "compute": { "provider": "e2b" } }'
The exact model names and tool schemas depend on the harnesses attached to your account. The important contract is stable: keep the session, switch the layer underneath, then read metrics for quality, latency, and cost.
The cost story: route each turn, not each customer.
AI spend gets expensive when every request takes the premium path. Cerver lets you keep one session while changing model, harness, or compute under it. That means easy turns can be cheap, hard turns can still be excellent, and the whole run remains auditable.
# Ask the gateway to score compute before creating a session.
curl -X POST https://gateway.cerver.ai/gateway/recommend \
-H "Content-Type: application/json" \
-d '{
"task": "Classify 500 support tickets",
"workload": "general",
"requirements": { "runtime": "node", "timeout_minutes": 5 },
"policy": { "mode": "cheapest" }
}'
# Later, move the same session to stronger compute if the task gets hard.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/compute \
-H "Authorization: Bearer $CERVER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "compute": { "provider": "e2b" } }'
Pricing on savings is a good enterprise story: baseline the customer's current premium-everywhere spend, route turns through Cerver, then charge against verified savings or savings-backed tiers.
The simple model.
- If you are building an app, start with the session layer.
- If you are adding a backend, implement the compute layer.
- A session binds app work to one chosen computer.
- requirements and policy tell Cerver how to choose that computer.
Sessions need a compute. Attach one first.
What "compute" means here
A cerver session is a place where work happens — and the work has to run somewhere. That somewhere is a compute: a sandbox, a relay-connected laptop, or a cloud runtime. Sessions don't run if no compute is attached.
A new account has no compute, so the first POST /v2/sessions returns 409 with a recommendation. Picking a path is the second thing you do, after creating an account.
Two paths — pick by where you'd rather pay
Local relay turns a machine you already own — laptop, mac mini, an always-on server — into a private compute. You pay nothing extra; you spend the cycles you already have. Best for prototyping, internal tools, anything where the laptop staying online is fine.
BYO cloud hands cerver a key to your Vercel or e2b account. Sessions provision sandboxes there; you pay the cloud bill (cerver doesn't mark it up). Best for production, multi-team, or anything that has to scale past one machine.
You can mix
Most accounts end up with both: a local relay for fast iteration, a cloud provider for shipped traffic. Each session picks at create time via the compute field — same session API, different runner underneath.
Local relay · one command
curl -fsSL https://cerver.ai/install.sh | bash
Installs uv if missing, runs the relay, opens a browser to log in, registers the host as a private compute. Self-updates from GitHub.
BYO cloud · register provider
POST /v2/account/providers
{
"provider": "vercel",
"credentials": { "vercel_token": "..." }
}
Attach to a session
POST /v2/sessions
{ "compute": { "provider": "vercel" } } // fresh provision
{ "compute": { "compute_id": "comp_…" } } // existing
{ "compute": null } // transcript-only
Verify
GET /v2/computes
// → list of attached computes (≥ 1)
Your vault holds the keys. Cerver only fetches them.
Cerver does not store provider keys (OpenAI, Anthropic, xAI, E2B, Vercel, Cloudflare, your own tools). Those live in your secrets backend. The cerver-mcp package ships a single MCP tool, secret_fetch(name), that resolves the name through whichever backend you configure and returns the value to the agent on demand.
One interface, two backends
The agent calls one tool — secret_fetch(name) — and gets back { name, value, source }. Where the value came from is a config choice, not an agent choice: the same agent code runs in dev and in prod.
The CERVER_SECRETS_BACKEND env var picks which backend answers the call. Default env reads from the host process environment (best for local). Set infisical in production to read from an Infisical project.
What cerver does and doesn't do
Cerver-mcp's job is small on purpose. It looks up the name in the active backend, returns it, and surfaces an error if the lookup fails. That's it.
It does not route names to providers, sign requests, cache values, or write its own audit log. Anything you'd want for governance — scope enforcement, access logs, rotation — comes from the backend itself. Infisical gives you all three; env gives you none.
Names are the contract
Use any name you like — OPENAI_API_KEY, BUFFER_API_KEY, STRIPE_SECRET_KEY. Whatever you store under that name in env or Infisical is what the agent gets. Cerver doesn't impose a registry; matching is exact.
Local · env backend (default)
export OPENAI_API_KEY=sk-...
export BUFFER_API_KEY=...
uvx cerver-mcp
# agent: secret_fetch("OPENAI_API_KEY")
# → { name: "OPENAI_API_KEY", value: "sk-...", source: "env" }
# CERVER_SECRETS_BACKEND defaults to "env" when unset.
Production · Infisical backend
{
"mcpServers": {
"cerver": {
"command": "uvx",
"args": ["cerver-mcp"],
"env": {
"CERVER_API_TOKEN": "ck_...",
"CERVER_SECRETS_BACKEND": "infisical",
"INFISICAL_TOKEN": "st...",
"INFISICAL_PROJECT_ID": "...",
"INFISICAL_ENVIRONMENT": "prod"
}
}
}
}
// INFISICAL_ENVIRONMENT defaults to "prod" when omitted.
// Token + project_id are required; cerver-mcp errors on startup
// without them when backend=infisical.
secret_fetch return contract
secret_fetch(name) -> { name, value, source }
// source is "env" or "infisical"
// raises ValueError if the name is missing in the active backend
// raises ValueError on unknown backend or missing infisical config
Verify the wiring
# cerver-mcp speaks MCP over stdio. Easiest sanity check is to
# import the tool function directly and call it:
uvx --from cerver-mcp python -c "
import asyncio, os
os.environ['OPENAI_API_KEY'] = 'sk-test'
from cerver_mcp.server import secret_fetch
print(asyncio.run(secret_fetch('OPENAI_API_KEY')))
"
# → {'name': 'OPENAI_API_KEY', 'value': 'sk-test', 'source': 'env'}
Next: create an Infisical service token (scope it to one project + environment in the Infisical UI — that's where access control lives), drop the token + project id into the config, and run the verify snippet. Rotation, scope, and audit trails live in Infisical; cerver-mcp just resolves names. Future backends (1Password, AWS Secrets Manager, GCP Secret Manager) would plug in behind the same secret_fetch tool — agent code wouldn't change.
The app-facing session API.
These are the canonical endpoints for products that want one stable doorway into execution. The lower-level compute API still exists, but the session API is the intended integration path for apps.
?tail=N, ?since=N, or ?full=1 when you need transcript entries.
{ compute: { provider } | { compute_id } | null }. Transcript persists across the swap.
Create a session. Attach compute. Run.
The simplest possible flow — three calls, four-line bodies. Use this shape for new integrations.
POST /v2/sessions
{ "compute": { "provider": "vercel" } }
POST /v2/sessions/:id/run
{ "code": "console.log('hi')" }
DELETE /v2/sessions/:id
Need a routing policy or a long-form recommendation flow? Replace the compute field with the legacy policy + requirements + workload shape — Cerver will score providers and pick. Both shapes are accepted on the same endpoint.
Long-form (policy-based) request
{
"task": "Boot a preview environment for a Next.js repo",
"workload": "preview",
"repo": {
"name": "branch-monkey",
"framework": "nextjs",
"languages": ["typescript"],
"signals": ["needs-preview", "short-lived"]
},
"requirements": {
"runtime": "node",
"package_install": true,
"public_preview": true,
"persistence_level": "medium",
"timeout_minutes": 20
},
"policy": {
"mode": "balanced",
"allowed_providers": ["vercel", "e2b"],
"max_startup_ms": 2000
},
"session_name": "preview-session"
}
{
"session_id": "sess_123",
"session_name": "preview-session",
"status": "ready",
"provider": "vercel",
"compute_id": "cmp_123",
"sandbox_id": "sbx_local_123",
"metrics": {
"provision_time_ms": 812,
"time_to_first_exec_ms": null,
"last_exec_latency_ms": null,
"average_exec_latency_ms": null,
"average_stream_open_latency_ms": null,
"total_exec_count": 0,
"total_stream_count": 0,
"interaction_count": 0,
"session_length_ms": 0,
"cost_estimate_usd": 0.01,
"uptime_percent": 99.3,
"predicted_startup_ms": 820,
"engagement_score": 0,
"engagement_label": "warming"
},
"routing": {
"recommended_provider": "vercel",
"confidence": "high",
"primary_reason": "Best fit for preview workloads",
"secondary_reasons": ["Startup within target", "Public preview supported"],
"fallback_order": ["e2b"],
"canary_run": false
}
}
Sessions are also transcripts.
Every Cerver session keeps the full turn-by-turn conversation on its transcript[] field — user messages, assistant replies, tool calls, tool results. That makes the same primitive a shared memory layer: any agent on the account can read what any other agent on the account did, just by listing sessions and reading transcripts. No vector DB, no separate retrieval service.
Read from plain HTTP
curl https://gateway.cerver.ai/v2/sessions?limit=20 \
-H "Authorization: Bearer $CERVER_API_TOKEN"
curl "https://gateway.cerver.ai/v2/sessions/SESSION_ID?tail=50" \
-H "Authorization: Bearer $CERVER_API_TOKEN"
# returns a summary plus the last 50 transcript entries.
# use ?full=1 only for an intentional full transcript download.
Read from an MCP-aware agent
Drop the API key into your agent's MCP config once. The cerver-mcp package surfaces three tools the agent can call directly: cerver_session_list, cerver_session_peek, and cerver_session_export.
{
"mcpServers": {
"cerver": {
"command": "uvx",
"args": ["cerver-mcp"],
"env": { "CERVER_API_TOKEN": "ck_..." }
}
}
}
Same data the dashboard at cerver.ai/dashboard/sessions shows you — humans and agents see identical content, scoped to whichever account owns the API token.
Run code, stream output, then read visibility back.
Session execution responses stay provider-aware internally, but Cerver adds its own session-level metadata around them. Streaming responses include extra Cerver headers so your app can observe the gateway path directly.
- X-Cerver-Session-Id identifies the logical session.
- X-Cerver-Provider tells you which backend actually executed the run.
- X-Cerver-Stream-Latency-Ms measures how long it took to open the stream.
| Metric | Meaning |
|---|---|
provision_time_ms |
How long the initial compute provisioning took. |
time_to_first_exec_ms |
How long until the first execution happened after session creation. |
last_exec_latency_ms |
Latency of the latest non-stream execution. |
average_stream_open_latency_ms |
Average latency to begin stream delivery. |
cost_estimate_usd |
Cerver’s estimated session spend so far. |
engagement_label |
One of idle, warming, engaged, or deep. |
Ask Cerver for a comparison before a real run.
Stress tests are the comparison layer. They let Cerver score compute providers for a representative workload and return a structured report your app or agent can use before placing real traffic.
curl -X POST https://your-cerver.example.com/gateway/stress-tests \
-H "Content-Type: application/json" \
-d '{
"task": "Compare preview launch backends",
"kind": "preview_launch",
"workload": "preview",
"requirements": {
"runtime": "node",
"public_preview": true,
"package_install": true,
"timeout_minutes": 20
},
"providers": ["vercel", "e2b"],
"sample_size": 5
}'
Today these reports are still simulated from provider profiles and routing logic. The next step is live canary execution.
The lower-level compute API still exists.
If you want to work directly with raw compute instead of a logical session, the lower-level API remains available. The URLs still say /sandbox for compatibility, but this is the compute layer.
Current provider picture inside Cerver.
This is the honest state of the current codebase. Cerver can advise on more providers than it can execute today.
Local compute adapter wired into Cerver. Once P69_BASE_URL points at a running local server, Cerver can treat your machine like another execution backend.
Live adapter verified through create, run, and stop. Best current execution path.
Execution path exists, but the current implementation is still container-backed and not the cleanest local path.
Live compute adapter verified through create, run, and stop. Uses bring-your-own E2B credentials.
Modeled in the advisor and catalog. Not yet wired as a live execution adapter.
If a service wants to appear in Cerver, it implements the compute interface.
A provider becomes runnable by implementing the shared compute contract. It becomes selectable by being registered in the provider registry and catalog. Apps do not use this directly; the session layer does.
export interface CerverInterface {
readonly providerName: "cloudflare" | "vercel" | "e2b" | "p69";
createSandbox(request, env): Promise<SandboxRecord>;
runSandbox(record, request, env): Promise<Response>;
runSandboxStream(record, request, env): Promise<Response>;
installPackage(record, request, env): Promise<Response>;
writeFile(record, request, env): Promise<Response>;
readFile(record, path, encoding, env): Promise<Response>;
getState(record, env): Promise<Response>;
setState(record, state, env): Promise<Response>;
deleteSandbox(record, env): Promise<Response>;
}
- Implement the compute contract to make the provider runnable.
- Register it in the provider registry to make it live inside Cerver.
- Add a provider profile to the gateway catalog so the router can score it.
- Then the session layer can recommend, pin, or fall back to it without being rewritten.