🧙🏼 Companies realise AI costs money

Howdy wizards,

Today's issue is brought to you by Pathway, who have staged a debate between leading researchers on model architectures in a boxing ring. With gloves and everything. Roll the tape.

Here’s the 1% of what’s brewing in AI you need to know.

The big thing

Companies are discovering that AI usage costs more than they bargained for.

Sometimes a lot more. For a couple of years most companies have been running the playbook of "use AI for everything, don’t worry about the costs".

This week, a number of high-profile articles put examples and numbers on it:

A $500M oops. One company burned through nearly $500M in a single month after forgetting to cap spending on employee AI licenses. Uber spent its entire 2026 Claude Code budget by April. Jellyfish found the heaviest Claude Code users spend 10x the tokens of mid-level ones — for roughly 2x the output.
Companies are now rationing. Uber, Microsoft, Meta, and Salesforce are now capping who gets the priciest AI tools; mainly agentic coding tools like Claude Code where companies had put them on uncapped, usage-based (API) billing instead of subscriptions, to avoid getting rate limited. A few power users running agents can run up enormous bills that way. Microsoft is reportedly clawing back Claude Code licenses outright.
The tokenmaxxing that backfired. Amazon quietly killed an internal AI-usage leaderboard after employees gamed it — running busywork prompts to farm points. Reward "AI activity" and you get activity, not results.
Meanwhile, Chinese models are getting comparatively much cheaper. While US frontier AI starts to feel like a luxury, DeepSeek made its 75% price cut permanent and Xiaomi cut API prices up to 99%.

Why it matters The phase of companies rewarding employees merely for using AI is ending.

The shift from using AI for one-off tasks to agents that can run for hours is driving bills to entirely new levels. As a result companies are finally asking what they’re getting back from all those tokens.

If I’m being a bit cynical, here’s what I think are the main reasons we’ve gotten to this point:

The whole AI race and the capital it has brought subsidising everyone’s usage for a long time, masking the real cost and making companies and end users cost insensitive.
Many processes being automated with AI weren’t creating economic value in the first place. AI not driving value shows up as a symptom of this much deeper organisational issue.
The surplus of time and resources saved with AI isn’t really used to create more. This is rooted in an incentive problem; employees don’t simply volunteer to take on more work to be nice.
AI workflows are often built in ways that waste tokens, but are complex enough to mask it. Those automations are often difficult to test, audit and optimise token usage for. Creating an automation with AI is easy. Creating an efficient one that uses exactly enough AI, the right kind, at the right time, isn’t.

For companies, these issues lead to higher costs for similar output.

My predictions for this are that we’ll be hearing less about reflexive AI usage and more about budgets and usage caps this year. I also think a lot more companies will start looking beyond their security teams’ red lines to Chinese models.

‘Tis the magic toggle inside Claude settings which can make your bills go from $20-200/mo to $20,000+/mo in a blink

PS if you’re interested in which use cases and AI approaches are actually driving value, niched down to whatever industry or business context you care about, I’ve created the web’s best overview at contextwindows.ai

FEATURING THE POST-TRANSFORMER DEBATE BY PATHWAY

Foundation model inventors debate the future of frontier AI architectures

Pathway, the AI neolab focused on continual learning and long-horizon reasoning, brought together four leading AI researchers to debate the Transformer's future:

Łukasz Kaiser, Transformer co-author and ChatGPT co-creator
Llion Jones, Transformer co-author and Sakana AI CTO
Adrian Kosowski, inventor of Dragon Hatchling (BDH) and Pathway CSO
Mathias Lechner, LNNs co-creator at MIT and Liquid AI CTO

The debate framed Transformer scaling as a double-edged sword. Scaling is the reason it dominates today, but co-author Llion Jones argued that same success keeps the field stuck on it, while the biggest long-term wins lie elsewhere.

Watch the full debate

NEWS NEWS NEWS ❦ NEWS NEWS NEWS

All the small things

Industry moves

Three of the biggest names in AI are heading to the public markets at once — SpaceX's June 12 IPO is set to be the largest ever. OpenAI is lining up a fall debut and Anthropic an October one. After years of private valuations set by negotiation, the labs will have to file S-1s that disclose what they've kept private or just given anecdotes on: real revenue, customer retention, and how much they spend to earn each dollar. These days are a cost reckoning not just for companies implementing AI, but for the labs, too.

New tools & product features

Robinhood opened a beta that lets AI agents trade stocks. You connect an agent to a dedicated brokerage account, and it analyzes your portfolio and places trades within limits you set. The guardrails are key: it’s a separate account that you can set hard caps on. Because letting an agent move real money is a different risk class for most people than spinning up an app.
Anthropic launched Dynamic Workflows in Claude Code, which splits a task across hundreds of parallel subagents that run until their results converge. On one level I think this is kind of cool, but using features like this heavily also creates even more of a black box around what you’re building. I’m personally trying to build orchestrations by defining each stage carefully and jamming with a code agent on a clearly scoped task. I don’t understand all the code, but I want to understand how the system is put together. Blindly spawning up a ton of agents exacerbates the problem of black box orchestrations that drain an unnecessary amount of tokens, and that are hard to optimise later.
Anthropic's Project Glasswing reported that its restricted Claude Mythos model found 10,000+ serious vulnerabilities in a month across about 50 partner organizations. They’re now saying a public Mythos-class model ships within weeks.

Models

Anthropic shipped Claude Opus 4.8. It leads the agentic-coding and reasoning benchmarks and can take new instructions mid-task without resetting its cache, which helps on long agent runs. Anthropic also says it's more willing to flag uncertainty and less likely to assert things it can't support. It’s priced the same as 4.7 and has a Fast Mode that runs 2.5x faster and 3x cheaper plus an effort dial to trade speed for depth. Is it just me, or are all these tuning parameters in Claude starting to feel like going from a point-and-shoot camera to a DSLR? You can definitely be more token efficient while working by finding the right trade-off at any given time. But I personally use max effort by default and try to focus my brain power on the core problem I’m solving. Trying to reason about the complexity of every one-off task I do with Claude Code is mentally taxing. When building an automation that will run hundreds or thousands of times, though, the tuning is essential.
Alibaba released Qwen3.7 Max, a 1M-token model that scores 80.4% on SWE-Bench and is compatible with the Anthropic API. It works directly in Claude Code. Soo.. at a time companies are panicking over token costs, China is cutting prices and shipping frontier-grade coding models compatible with the setup Western companies already use.

❦

If you enjoy these insights I also share the best ones on LinkedIn — connect with me here.

Thank you for reading.

You are a delight.

Dario

How was today's brew?

Leave me a small feedback. Votes w/o context don't help me much. Thankee 🤝

Know which AI use cases are paying off — with Context Windows Pro

Most companies pick AI use cases by brainstorming internally. 90% of those initiatives fail.

I’ve created Context Windows just so you can pick the winners.

🟦 Subscribe to Pro to find high-performing use cases by industry, from 2,000+ companies. Visit contextwindows.ai or book a demo with me.

Disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes work with sponsors and may earn a commission if you buy something through a link in here. If you choose to click, subscribe, or buy through any of them, THANK YOU – it will make it possible for me to continue to do this.