gateway protocol v0.3.0

Cerver Agent Protocol

This page describes the current execution gateway surface in api/cerver-api. The field names and envelopes below match the live contract.

Session protocol Back to people page

Session Protocol

Three calls are enough.

Open a session, stream the run, then read the metrics. The gateway owns provider choice and execution visibility.

POST /gateway/sessions
POST /gateway/sessions/:id/run/stream
GET  /gateway/sessions/:id/metrics

One session surface The agent opens one logical session, even if Cerver changes providers underneath.

One stream surface The output comes back through one stream, not through four vendor SDKs.

Create a session

{
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "repo": {
    "name": "branch-monkey",
    "framework": "nextjs",
    "languages": ["typescript"]
  },
  "requirements": {
    "runtime": "node",
    "package_install": true,
    "public_preview": true,
    "persistence_level": "medium",
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced",
    "allowed_providers": ["vercel", "cloudflare", "e2b"]
  }
}

Session response

{
  "session_id": "ses_123",
  "session_name": "branch-monkey",
  "status": "ready",
  "provider": "vercel",
  "sandbox_id": "sbx_123",
  "metrics": {
    "provision_time_ms": 820,
    "predicted_startup_ms": 820,
    "cost_estimate_usd": 0.123
  },
  "routing": {
    "recommended_provider": "vercel",
    "confidence": "high",
    "fallback_order": ["cloudflare", "e2b"],
    "canary_run": false
  }
}

Important: POST /gateway/sessions calls the real routing engine with requireExecutable: true. If no executable provider qualifies, the API returns 409 with the recommendation report.

Compute Setup

Sessions need a compute. Attach one first.

A new account has no compute attached, so POST /v2/sessions returns 409 with a recommendation report. Pick one path:

Option A — Local relay

One command on the machine you want to expose as compute (laptop, mac mini, server):

curl -fsSL https://kompany.dev/install-cerver.sh | bash

Installs uv if missing, runs the relay, opens a browser to log you in, registers the host as a private compute on your account. Self-updates by polling GitHub — leave it running on an always-on machine and it stays current.

Option B — BYO cloud (Vercel, e2b)

Enable a provider with your own credentials in the dashboard at cerver.ai/dashboard/providers, or via API:

curl -X POST https://gateway.cerver.ai/v2/account/providers \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"provider": "vercel", "credentials": {"vercel_token": "..."}}'

Verify

curl https://gateway.cerver.ai/v2/computes \
  -H "Authorization: Bearer $CERVER_API_TOKEN"
# should list at least one compute

Tip: the dashboard's cerver.ai/dashboard/computes is the human surface for the same compute data — see provider, status, capabilities, and heartbeat.

Secrets

Bring your own secrets store.

Cerver intentionally does not store user app-secrets — those belong in a tool built for it. The cerver-mcp package ships a secret_fetch(name) tool that gives the agent one uniform interface regardless of backend.

Default: env backend

export BUFFER_API_KEY=...
uvx cerver-mcp
# agent: secret_fetch("BUFFER_API_KEY") → reads from process env

Production: Infisical backend

{
  "mcpServers": {
    "cerver": {
      "command": "uvx",
      "args": ["cerver-mcp"],
      "env": {
        "CERVER_API_TOKEN": "ck_...",
        "CERVER_SECRETS_BACKEND": "infisical",
        "INFISICAL_TOKEN": "st...",
        "INFISICAL_PROJECT_ID": "...",
        "INFISICAL_ENVIRONMENT": "prod"
      }
    }
  }
}

Audited, rotateable, never stored on the relay disk. Future backends (1Password, AWS, GCP) plug in the same way.

Cross-Agent Memory

Sessions are also transcripts.

Every cerver session keeps the full turn-by-turn transcript on its transcript[] field. That makes the same primitive a memory layer any agent on the account can read — your cron from yesterday, the code reviewer from last week, an agent in a sibling app on the same key.

No separate retrieval service to wire up. No vector DB. Just GET /v2/sessions.

From plain HTTP

curl https://gateway.cerver.ai/v2/sessions?limit=20 \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

curl https://gateway.cerver.ai/v2/sessions/SESSION_ID \
  -H "Authorization: Bearer $CERVER_API_TOKEN"
# → returns the full session record including transcript[]

From an MCP-aware agent

Drop the API key into the agent's MCP config once. The cerver-mcp package surfaces three tools.

{
  "mcpServers": {
    "cerver": {
      "command": "uvx",
      "args": ["cerver-mcp"],
      "env": { "CERVER_API_TOKEN": "ck_..." }
    }
  }
}

cerver_session_list Discover sessions on the account. Filter by status (running / idle / ended) and limit.

cerver_session_peek Read the most recent N turns of one session. Cheap context for "what happened last".

cerver_session_export Pull the full transcript as plain text. Use for thorough recall of a specific past run.

First-time vs N-th-time

A first-time agent on a fresh account: cerver_session_list returns empty — start a new session, do the work. An agent on an account with prior runs: list returns sibling agents' history, peek into the relevant ones, inherit context, then act. The session you create afterwards becomes part of the same memory pool the next agent reads.

Tip: the dashboard at cerver.ai/dashboard/sessions uses the same endpoints — humans and agents see identical data.

Request Envelope

What the agent sends

Agents should describe the work, requirements, and policy. Vendor choice remains a gateway concern.

Task What the agent is trying to do: coding, preview, browser work, long job, or validation.

Requirements Runtime, package install, public preview, browser need, timeout, desktop need, or persistence.

Policy Balanced, fastest, cheapest, resilient, or pinned, with optional provider allowlists and ceilings.

Context Repo hints, stack signals, and why the run matters.

{
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "repo": {
    "name": "branch-monkey",
    "framework": "nextjs",
    "languages": ["typescript"],
    "signals": ["public preview expected", "package install likely"]
  },
  "requirements": {
    "runtime": "node",
    "package_install": true,
    "public_preview": true,
    "persistence_level": "medium",
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced",
    "allowed_providers": ["vercel", "cloudflare", "e2b"],
    "max_cost_usd": 0.2,
    "max_startup_ms": 2500
  },
  "metadata": {
    "importance": "high"
  }
}

Use the real field names: the current gateway expects public_preview and timeout_minutes, not ad-hoc names like public_port or max_duration_minutes.

Provider Status

What Cerver can route today

This section reflects the current implementation, not the long-term roadmap.

Cloudflare Integrated, ready, and always executable in the current gateway.

Vercel Sandbox Integrated and executable when Vercel credentials are configured. Otherwise it stays visible as planned.

E2B Modeled in the advisor and scoring engine, but not wired for live execution yet.

Daytona Modeled in the advisor and scoring engine, but not wired for live execution yet.

Practical meaning: recommendation endpoints can score all four providers, but session creation only succeeds when at least one executable provider survives filtering.

Routing Rules

How Cerver decides

Filter providers that cannot satisfy the hard requirements.
Score the remaining providers for cost, speed, reliability, and fit.
Run a canary only when confidence is low or the run matters enough to compare.
Return one winner, one fallback path, and the reasons behind the choice.

Hard filters

Cerver filters a provider out before scoring if any of these are true:

The provider is blocked by allowed_providers.
The request needs live execution and the provider is not executable in Cerver yet.
The runtime, browser, desktop, preview, timeout, or cost ceilings cannot be met.
The provider startup estimate exceeds max_startup_ms.

Routing modes

balanced Capability fit leads, with latency, cost, reliability, persistence, health, and readiness all contributing.

fastest Startup time dominates, but capability fit and readiness still matter.

cheapest Cost dominates, but capability fit and readiness still matter.

resilient Reliability and persistence carry more weight and Cerver always asks for a canary when two executable providers exist.

pinned Acts like balanced scoring, but the pinned provider gets a large score boost if it still passes the hard filters.

Confidence and canary rules

Confidence is high when the top executable score is strong and the gap to second place is large.
Confidence drops to medium or low as the score or the gap shrinks.
Cerver sets canary_run when the top two executable providers are close, or when the workload is long_running.

Session Record

Session lifecycle and metrics

A gateway session is the canonical record for routing, transcript, events, and metrics, even though execution happens inside a provider sandbox.

provisioning Reserved for sandbox boot or setup.

ready Sandbox exists and the session can accept work.

running Used conceptually while execution is in progress.

idle The last input or execution completed and the session is waiting.

failed Terminal failure state for session management.

terminated The session has been closed or its sandbox is gone.

Metrics tracked today

provision_time_ms How long session creation took including sandbox provisioning.

time_to_first_exec_ms Elapsed time from session creation to the first code execution event.

average_exec_latency_ms Average duration across non-stream execution events.

average_stream_open_latency_ms Average measured latency for stream-open requests.

session_length_ms Derived from the current time and createdAt.

engagement_score Computed from session length, interaction count, and execution count.

Streaming note: POST /gateway/sessions/:id/run/stream adds X-Cerver-Session-Id, X-Cerver-Provider, and X-Cerver-Stream-Latency-Ms response headers.

Response Envelopes

What comes back to the agent

The response envelope should be executable by an agent and readable by a human without translation.

{
  "request_id": "req_123",
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "providers": [
    {
      "name": "vercel",
      "status": "viable",
      "score": 0.862,
      "estimated_cost_usd": 0.123
    }
  ],
  "decision": {
    "recommended_provider": "vercel",
    "confidence": "high",
    "primary_reason": "Vercel Sandbox is the best fit for short preview-style execution right now.",
    "secondary_reasons": [
      "Supports preview-style execution cleanly",
      "Available for live routing through Cerver today"
    ],
    "fallback_order": ["cloudflare"],
    "canary_run": false
  },
  "human_summary": {
    "headline": "Route this task through Vercel Sandbox.",
    "next_action": "Start a session on Vercel Sandbox and keep Cloudflare available."
  }
}

Session metrics response

{
  "session_id": "ses_123",
  "provider": "vercel",
  "status": "idle",
  "routing": {
    "recommended_provider": "vercel",
    "confidence": "high"
  },
  "metrics": {
    "provision_time_ms": 820,
    "time_to_first_exec_ms": 1110,
    "last_exec_latency_ms": 920,
    "average_exec_latency_ms": 980,
    "average_stream_open_latency_ms": 350,
    "total_exec_count": 3,
    "total_stream_count": 1,
    "interaction_count": 4,
    "session_length_ms": 440000,
    "cost_estimate_usd": 0.071,
    "uptime_percent": 99.3,
    "predicted_startup_ms": 820,
    "engagement_score": 0.61,
    "engagement_label": "engaged"
  }
}

Endpoint Index

Gateway endpoints

POST /gateway/recommend Ask Cerver to pick a provider Use when the agent wants a recommendation before opening a session.

POST /gateway/sessions Open one logical execution session Use when the agent is ready to start work through the gateway.

POST /gateway/sessions/:id/input Store chat or task input Use to append user, assistant, or system messages into the gateway transcript.

POST /gateway/sessions/:id/run Execute without streaming Use when the agent wants a JSON result wrapper with duration and metrics.

POST /gateway/sessions/:id/run/stream Execute and stream output Use when the agent wants one stream regardless of which provider wins.

GET /gateway/sessions/:id/metrics Read provider and latency data Use to explain what happened and decide whether policy should change.

GET /gateway/sessions/:id Read the session record Use to inspect routing, transcript, events, and fresh metrics together.

DELETE /gateway/sessions/:id Terminate the logical session Use to clean up both the gateway record and the backing sandbox.

POST /gateway/stress-tests Compare providers under the same workload Use to benchmark cold starts, streaming, long jobs, and recovery behavior.

Stress Test Protocol

When to benchmark instead of guessing

When a repo is new and Cerver has little history for it.
When the run is expensive or important enough to justify a canary.
When providers have changed in speed, health, or cost.
When the team wants a repeatable comparison report across vendors.

Kinds supported now

The current gateway understands cold_start, stream_response, package_install, preview_launch, and long_session.

Stress-test request

{
  "kind": "cold_start",
  "workload": "preview",
  "providers": ["vercel", "cloudflare", "e2b"],
  "sample_size": 12,
  "requirements": {
    "runtime": "node",
    "public_preview": true,
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced"
  }
}

Stress-test result

{
  "testId": "test_123",
  "kind": "cold_start",
  "workload": "preview",
  "sampleSize": 12,
  "winner": "vercel",
  "summary": "Vercel Sandbox is the strongest cold start candidate.",
  "report": {
    "headline": "Vercel Sandbox is the strongest cold start candidate.",
    "guidance": "Across 12 simulated runs, Vercel Sandbox shows the best mix of startup time, cost, and availability.",
    "next_policy": "Route preview runs to vercel first, then fall back to cloudflare, e2b."
  }
}

Current implementation: stress tests are simulated from provider baselines and recommendation scores today. They are not yet live multi-provider canaries.