Cost & ROI

Uber capped AI coding at $1,500 a developer. The cap is the wrong fix.

When a company that wants more AI output puts a ceiling on it, that's not discipline — it's a confession. They couldn't see where the money went, so they capped the only number they had.

According to reporting on internal Uber numbers, engineers were spending somewhere between $500 and $2,000 each per month on AI coding tools — and leadership responded by imposing a hard cap of about $1,500 per developer. Microsoft, separately, was reported to have throttled internal Claude Code use because it got "a little too popular." Two of the most sophisticated engineering organizations on earth, and the lever they reached for was the same one a parent reaches for with a teenager's phone bill: a ceiling.

It's worth sitting with how strange that is. These companies want their engineers writing more software, faster. AI coding agents demonstrably do that. So why throttle the thing that's working?

Because they couldn't answer a simpler question first: which of those dollars were worth it?

A cap is what you do when you can't see

A flat per-developer ceiling is a blunt instrument. It treats every token as equally suspect. The engineer who burned $1,400 shipping a migration that saved the company three weeks gets the same ceiling as the one who left an agent looping on a dead end overnight. The cap can't tell them apart — and neither could the dashboard, which is the actual problem.

The cap isn't a cost-control strategy. It's an admission that per-session cost was invisible, so the only governable number left was the monthly total.

This is the structural flaw in how almost everyone buys AI coding today. You pay per seat, or per token, into a single vendor's black box. The bill arrives monthly, aggregated, unattributable. You can see that you spent $1,500. You cannot see which task, which model, which agent run earned it — or wasted it. So when the number gets scary, the only move available is to cap the number.

Why the bill is so unpredictable in the first place

Agentic coding is structurally token-hungry. Every turn re-sends a growing context window plus tool output — files read, diffs, test logs. A single developer running agents all day moves a startling amount of tokens. Run the math for a modest 10-person team:

Per dev / day
~10M
tokens, daily-driver agentic use
× 10 devs / month
~2.2B
tokens routed in a working month
Provider bill
~$2.6k
cache-heavy mix, blended rate
Same 2.2B tokens, billed three different ways, can swing from ~$2.6k to over $15k — purely from which model and billing path you happened to route through.

That 6× spread is the whole story. The identical work, measured in identical tokens, costs wildly different amounts depending on a routing decision nobody is consciously making. When the variance is that wide and the visibility is that low, of course finance panics. A cap is the rational response to a number you can't decompose.

The unit that's missing: the session

You don't govern spend with a ceiling. You govern it with attribution — and the right unit of attribution for AI work is the session: one task, one transcript, with its model, its compute, and its cost bound together.

That's the bet behind cerver. Every piece of agentic work is a session you can see end to end:

  • Cost per task, not per month. Each session carries its own token count and dollar cost. The $1,400 migration and the $40 overnight loop stop looking the same.
  • Route to cheaper-good-enough. Because the session is model-agnostic, you can send the cheap, high-volume work to a cheap model and reserve the frontier model for what needs it — instead of paying flagship rates for everything by default.
  • Compare instead of guess. Run the same prompt across Claude and Codex on one session, keep the winner, and see what each actually cost to get there.
  • Spot the waste. A bottleneck session — the one looping, the one burning tokens with nothing to show — is visible as a session, not buried in a monthly aggregate.

With that, you never need Uber's cap. The cap exists to bound a number you can't explain. Make every dollar explainable at the session level and the question changes from "how do we spend less?" to "which spend is working?" — which is the question a company that wants more AI output should actually be asking.

The takeaway

Uber's $1,500 ceiling will be remembered as an early-days artifact — the moment the bills outran the tooling. The teams that win the next phase won't be the ones who spent the least. They'll be the ones who could see the most: which model, which task, which session earned its keep. Capping is what you do in the dark. Cerver is about turning the lights on.

A note on the numbers: the Uber ($500–$2,000/engineer, ~$1,500 cap) and Microsoft figures come from secondary reporting of internal data (originating with The Information) and should be read as reported, not confirmed. The token and cost math is an illustrative model for a 10-developer agentic team, not a measurement of any specific company.

See your own number before someone caps it.

The cost calculator shows a dev team — and the CFO signing the bill — exactly what they spend today and what session-level routing could save. Run it on your numbers.

← All posts