The thesis
The session is the unit.
Per-seat and per-month are averages of an average. They hide the only thing worth knowing about AI spend: which work was worth it. The session — one task, with its model, its compute, and its cost bound together — is the unit that makes the bill make sense.
Ask a team what they spend on AI coding and you'll get a monthly number. Ask what that number bought and the room goes quiet. The monthly total is real, but it's the wrong altitude — it's the sum of thousands of separate pieces of work, flattened into one figure you can't act on. You can cap it, but you can't steer it.
The fix isn't a better dashboard on top of the total. It's a smaller unit underneath it.
Watch one session
A session is one task moving through a model and a machine. A prompt enters, a harness reasons over it, compute does the work, a result comes back — and every token along the way is counted. Here's one, live:
$0.00 · @ $2/1M
One task, one transcript, its cost attached. Illustrative — token flow and price are modeled at cerver's flat $2 / 1M.
That's the whole idea: the cost isn't a line on a monthly invoice anymore — it's a property of the work itself. The $1,400 migration and the $40 overnight loop stop looking the same, because each one carries its own number.
Now look at ten thousand of them
Here's why the unit matters. When you actually price work session by session, the cost isn't a single figure — it's a distribution. A long tail. Most sessions are nearly free; a thin minority cost real money. The monthly total is just the area under this curve, and the per-seat price is one blurry point on it:
Illustrative distribution across ~10,000 agentic sessions. The mean sits well right of the median — the tail is where the money actually hides.
This is the picture a monthly total erases. Mean and median are far apart, which means a handful of sessions dominate the bill. You don't fix that with a ceiling — a cap punishes the cheap sessions and the one migration alike. You fix it by being able to see the tail and route around it.
Why the unit changes everything
Once the session is the unit, the moves you couldn't make before become obvious:
- Attribution. Cost per task, not per month — so you know which spend earned its keep and which was a runaway loop.
- Routing. The session is model-agnostic, so cheap, high-volume work goes to a cheap-good-enough model and the frontier model is reserved for the tail that needs it.
- Comparison. Run the same prompt across Claude and Codex on one session, keep the winner, and see what each cost to get there.
- Predictability. Priced per session at a flat $2 / 1M tokens, the bill stops being a monthly surprise and becomes a number you can forecast.
The takeaway
Every governance problem in AI spend right now — Uber's cap, Copilot's token-bill shock, Microsoft throttling Claude Code — is the same problem wearing different clothes: the unit was too big to see. Drop to the session and the question flips from "how do we spend less?" to "which spend is working?" — the only question worth asking when the work is this valuable.
A note on the visuals: the session animation and the cost distribution are illustrative models, not measurements of a specific account. Token flow and pricing are shown at cerver's flat $2 / 1M tokens; the distribution is a representative long-tail across a modeled ~10k agentic sessions.
See your own sessions — and your own tail.
The calculator shows what your team spends today and what session-level routing would save. Or add your numbers to the benchmark and see where you land.