Cerver for FinOps — query every session, match the resource, cut the waste.

01 — The trap

You sized for the worst case. You pay for it on everything.

Most AI setups hard-code the biggest model and the biggest machine, then run every task through it — the one-line rename and the overnight migration, billed the same. Worse, the spend lands as a single monthly total: you can't see which user, app, or agent ran it up, so the only lever left is a blunt cap.

02 — Query the sessions

See what every user, app, and task actually spent.

Every piece of work is a session, and every session is queryable — cost, tokens, model, and compute bound together. Group by user, by app, by agent, by day. The runaway that's been looping all night stops hiding in the aggregate.

By user & team

Rank spend by person and team. See who's driving the bill — and what they actually got for it.

By app & task

Attribute every dollar to the app, agent, or task that earned it — not a monthly lump nobody can decompose.

The runaways, live

Query running sessions over a threshold and kill the one that's looping — before it bills, not after.

03 — Match the resource

Right-size every run — cheap-good-enough by default.

Because the session is model- and compute-agnostic, you route each task to the resource it actually needs: a cheap model and a small box for routine work, the frontier model only for the tail that earns it. You stop paying flagship rates for everything by reflex.

GET  /v2/sessions?group=user&since=30d   → cost per user, ranked      who spent what
GET  /v2/sessions?status=running&over=5  → the runaways, live         gate before it bills
POST /v2/sessions/:id/run-llm model:auto → right-size per task         cheap unless it needs more

Today10,000 sessions · all on the frontier model

frontier × 10,000$4,000 / mo

Right-sizedsame 10,000 sessions · matched to the task

25% frontier — the work that earns it 35% mid-tier 40% open-source

the mix$1,540 / mo · −62%

Each square is 50 sessions. Illustrative per-session averages: frontier $0.40, mid $0.12, open-source $0.03 — your real mix comes from your real sessions, queryable by model. Set the mix as a routing policy →

04 — Own the record

Own your transcripts. Dispute the bill.

A token bill is a black box: you get a number and a "trust us." When every session is a transcript you own, that changes. You hold the exact record of what was sent and what came back — token for token — so a charge that looks wrong becomes a charge you can actually dispute, with receipts instead of a shrug. Owning your transcripts isn't just good hygiene; it's the only leverage you have when the meter and the invoice disagree.

05 — Save where you can

Cut the 30–40% that's just waste.

Across the teams we benchmark, a third to nearly half of AI-coding spend is avoidable — oversized routing, looping agents, work that never needed the frontier model. Query it, match it, gate it. Permission to run lives in one place; the cost of every run lives right next to it.

What it would save → See the benchmark

06 — Install

One line. Then query everything.

$ curl -fsSL https://cerver.ai/install.sh | bash

Bring your own compute. cerver's price is a flat $2 / 1M tokens, visible per session — so the bill is something you forecast, not absorb.

Your AI spend, finally under control.