What is the projected token spending trend for engineers?

Token spending is becoming a standard compensation expense as AI tools become integral to development workflows. The projected trend shows token costs will reach approximately 20% of a developer's salary, with 3.8% representing the current minimum threshold, not the ceiling for expenses. Top engineers already spend $100K+ annually on tokens. This figure is about to become a standard line item in every compensation package, fundamentally changing how engineering budgets are structured and allocated for AI infrastructure and tool access.

What is China's actual advantage in open AI models?

China's open-model lead is distillation of US frontier models, not independent AI research. Rather than developing entirely new frontier models, China's strategy focuses on efficiently extracting and repackaging capabilities from advanced US models through knowledge distillation techniques. This approach allows rapid scaling and deployment of competitive models but represents an optimization and engineering strategy rather than fundamental innovation. Understanding this distinction clarifies competitive positioning in global AI markets, showing that while China excels in implementation, frontier innovation remains concentrated in US-based research efforts.

How has the software development bottleneck shifted?

The software bottleneck has moved: writing code → reviewing code → deciding what to build. Rather than code generation being the limiting factor in software development, engineers now face bottlenecks in code review processes and architectural decision-making. As AI tools accelerate initial code production, the constraint has shifted upstream and downstream from pure coding. This reflects a fundamental change in engineering workflows where time and cognitive load spent on code quality assurance and strategic planning increasingly outweighs time spent writing code itself.

What is the new approach to engineering interviews?

Modern engineering interviews are being reimagined to assess AI-era capabilities. The new engineering interview approach involves giving candidates $150, a coding agent, and a blank prompt to evaluate their problem-solving ability in realistic AI-assisted workflows. This shift from traditional code-based assessments reflects how actual engineering work has evolved. Rather than testing raw coding speed or algorithm memorization, this method assesses candidates' ability to effectively utilize AI tools, make architectural decisions, and translate vague requirements into functional solutions.

Technology & the Future

Open Models vs Frontier Models: Who Actually Wins? | The $100K Token Budget Every Engineer Will Need

The Twenty Minute VC

Hosted by Unknown

1h 12m episode

7 min read

5 key ideas

July 4, 2026

Listen to original episode

Top engineers already spend $100K+/year on tokens — and that figure is about to become a standard line item in every compensation package.

In Brief

Top engineers already spend $100K+/year on tokens — and that figure is about to become a standard line item in every compensation package.

Key Ideas

Token Costs Rapidly Approaching 20% of Salaries

Token spend will reach ~20% of developer salary; 3.8% is the floor, not the ceiling.

China's AI Lead Built on US Models

China's open-model lead is distillation of US frontier models, not independent AI research.

Young AI-Fluent Engineers Vastly Outperform Veterans

Sierra's most effective employees are 22-year-olds with AI fluency, not veterans.

Product Decisions Now Bottleneck Software Development

The software bottleneck has moved: writing code → reviewing code → deciding what to build.

Building Products Replaces Algorithms in Interviews

The new engineering interview: give candidates $150, a coding agent, and a blank prompt.

Why does it matter? Because the $100K token budget is about to land in every engineer's offer letter

Clay Bavor, co-founder of Sierra — $16 billion valuation, 40% of the Fortune 50 as customers — just reframed how companies should think about headcount costs. Token spend isn't an IT line item. It's a compensation component, heading straight toward 20% of developer salary.

Top engineers "really leaning into Claude, Codex and so on" are already spending $100K+ per year on tokens — a meaningful fraction of an engineering salary
The Salesforce figure everyone cites — 3.8% of dev salary on tokens — is the floor, not the ceiling
China's open-model lead is strategic distillation of US frontier labs, not independent AI research
Sierra's most effective employees are 22-year-olds who've never had another job — and the company rebuilt its interview process around that reality

Token spend is heading to 20% of developer salary — and CFOs modeling 3.8% are already behind

"$100,000 a year on tokens" isn't a forecast. Bavor says he's observing it now among top engineers leaning hard into AI coding tools. At $500K total comp for a strong Valley engineer, that's already 20% of salary at the frontier.

His frame for CFOs: headcount in the future won't just carry salary and SBC. It'll carry a token budget too. "Here's your salary, here's your token budget, have at it."

When Harry Stebbings floated the Salesforce number — $300M/year on Anthropic, about 3.8% of developer salaries — Bavor was blunt: "I would not bet on 3.8%. I would bet on much closer to 20%." The stakes split cleanly on both sides: at 3.8%, many AI-adjacent companies are "grossly overvalued." At 20%, they're "undervalued." Sierra is currently absorbing high token costs as the price of learning fastest, but the budget formalization is coming for every company.

Frontier model demand is effectively unbounded — open models just handle whatever frontier already solved

"We have not yet appreciated the unbounded demand for call it frontier levels of intelligence." That's Bavor's reply to the bear case: open models commoditize AI, frontier labs get squeezed, the whole stack deflates.

His counterargument is a staffing thought experiment: if you asked any software company whether they'd upgrade all staff engineers to principal-level, "a hundred out of a hundred" would say yes. And right now, a rounding error of enterprise tasks are fully automated — that gap isn't closing just because open weights are cheaper.

The picture that emerges: an assembly line where yesterday's frontier becomes today's commodity. GPT-4 is now 1/300th the cost per equivalent token. Former frontier models migrate to volume workloads; current frontier models handle complexity that open weights can't yet touch. Both layers expand. The market isn't winner-take-all between open and frontier — it's tiered, and the top tier has no ceiling in sight.

China's open-model lead is distillation of US frontier labs — not a research breakthrough

The open-weights models coming out of China aren't the product of parallel AI research. Bavor's read: "my impression is many of the models coming from China are derived from training runs done in the US."

Chinese labs are doing aggressive scaled distillation of US outputs — a strategy US labs are structurally blocked from running on themselves. "If the US-based labs are developing frontier models, are they going to compete with themselves and drive price pressure by releasing open-weights models of similar capability? If I was running that business, that's not something I would do."

The incentive asymmetry is clean. US frontier labs price their hosted inference on quality; open-sourcing high-capability models would undercut that business. "If you can't build frontier models yourself — okay, maybe the next best approach is to distill them and offer them up." Treat Chinese open-weight releases as strategic distillation plays, not evidence of independent research capability.

Sierra scrapped the whiteboard — candidates get $150, a coding agent, and a blank prompt

"$150. Choose your coding agent. Bring your own laptop. Build something."

That's Sierra's engineering interview now. The prompt is open-ended: think through an application you'd like to build, then build it. Afterward, walk them through the decisions. What gets evaluated: architecture, systems design, product thinking, values. Not which algorithms candidates can reproduce under pressure.

The old interview tested for what AI can already do. The new one tests for the judgment that directs AI. Bavor made it explicit: "I will be disappointed if in the next no more than two months not every one of our interviews has some strong AI native component to it." Engineering came first. The rest of the company is next.

Sierra's most effective employees are 22-year-olds who've never had another job

Twenty-two or twenty-three, no prior work experience, "completely AI-pilled." These are some of Sierra's most effective employees companywide — outperforming colleagues with years of domain expertise.

"I can't remember a time when a young person with no work experience, but with the right mindset and experience using some of these tools, has ever been so valued." The edge isn't a credential — it's four years of university time spent internalizing AI-native workflows before bad habits had a chance to calcify. Years of experience now carry a liability if those years were spent developing reflexes that AI has already replaced.

The software bottleneck has moved from writing code to deciding what's worth building at all

Three to twenty times more productive — that's the range Sierra's most AI-native engineers self-report in features shipped. The constraint used to be writing code. Now it's reviewing code. Bavor's next stop: deciding what is worth building, "editing what could exist to what should exist."

Each time the bottleneck moves up the stack, the skills that generate economic value shift with it. The engineers delivering the most leverage in 18 months won't be the fastest prompt writers. They'll be the ones with the sharpest product judgment and architectural taste — skills that take years to develop and can't be accelerated with a token budget.

Sierra built a company-wide MCP gateway — and it's becoming as fundamental as the data warehouse

Every system Sierra uses to run itself — Slack, docs, operating reviews, the codebase — now feeds into a single MCP server that any employee can query through their AI agent. Add it to your Claude or Codex instance and you get full, permission-scoped access to institutional knowledge on demand.

"It's kind of like having superpowers. You can interrogate in essence the entirety of the company." Bavor uses a personal Pine Cone skill to pre-screen every hire: he's trained it on what to flag in interview packets, turning a slow read into a faster, deeper one. Engineers use Pine Cone to build Pine Cone.

"Approaching indispensable" is the phrase he reaches for — not quite there yet, but close. The internal AI platform is no longer a productivity experiment. At Sierra, it's becoming load-bearing infrastructure.

The next moat isn't talent density — it's knowing which problems to point intelligence at

Everything here converges on a question nobody has answered yet: which problems are worth solving with frontier intelligence, and at what scale? Once execution stops being the bottleneck, strategic prioritization becomes the only durable source of advantage. Companies building the infrastructure for that question now — MCP gateways, AI-native hiring, per-employee token budgets — aren't just running efficiency experiments. They're accumulating institutional judgment about where intelligence creates the most leverage, and that compounds.

The race is no longer about who builds fastest. It's about who figures out what to build first.

Topics: artificial intelligence, enterprise software, open source models, frontier models, token economics, hiring, engineering productivity, AI agents, startup operations, Sierra, MCP, coding agents, Chinese AI, model distillation, board management

Frequently Asked Questions

What is the projected token spending trend for engineers?: Token spending is becoming a standard compensation expense as AI tools become integral to development workflows. The projected trend shows token costs will reach approximately 20% of a developer's salary, with 3.8% representing the current minimum threshold, not the ceiling for expenses. Top engineers already spend $100K+ annually on tokens. This figure is about to become a standard line item in every compensation package, fundamentally changing how engineering budgets are structured and allocated for AI infrastructure and tool access.
What is China's actual advantage in open AI models?: China's open-model lead is distillation of US frontier models, not independent AI research. Rather than developing entirely new frontier models, China's strategy focuses on efficiently extracting and repackaging capabilities from advanced US models through knowledge distillation techniques. This approach allows rapid scaling and deployment of competitive models but represents an optimization and engineering strategy rather than fundamental innovation. Understanding this distinction clarifies competitive positioning in global AI markets, showing that while China excels in implementation, frontier innovation remains concentrated in US-based research efforts.
How has the software development bottleneck shifted?: The software bottleneck has moved: writing code → reviewing code → deciding what to build. Rather than code generation being the limiting factor in software development, engineers now face bottlenecks in code review processes and architectural decision-making. As AI tools accelerate initial code production, the constraint has shifted upstream and downstream from pure coding. This reflects a fundamental change in engineering workflows where time and cognitive load spent on code quality assurance and strategic planning increasingly outweighs time spent writing code itself.
What is the new approach to engineering interviews?: Modern engineering interviews are being reimagined to assess AI-era capabilities. The new engineering interview approach involves giving candidates $150, a coding agent, and a blank prompt to evaluate their problem-solving ability in realistic AI-assisted workflows. This shift from traditional code-based assessments reflects how actual engineering work has evolved. Rather than testing raw coding speed or algorithm memorization, this method assesses candidates' ability to effectively utilize AI tools, make architectural decisions, and translate vague requirements into functional solutions.

Read the full summary of Open Models vs Frontier Models: Who Actually Wins? | The $100K Token Budget Every Engineer Will Need on InShort

App Store Google Play

Open Models vs Frontier Models: Who Actually Wins? | The $100K Token Budget Every Engineer Will Need

In Brief

Key Ideas

Token Costs Rapidly Approaching 20% of Salaries

China's AI Lead Built on US Models

Young AI-Fluent Engineers Vastly Outperform Veterans

Product Decisions Now Bottleneck Software Development

Building Products Replaces Algorithms in Interviews

Frequently Asked Questions

Related Episodes

Leo Aschenbrenner's Largest Holding: Inside the $90BN Bloom Energy | KR Sridhar

How We Got Fred Wilson, Benchmark and Index to Invest $94M | Why Robinhood's Strategy is Wrong

Nikesh Arora on the Frontier Model Problem: Breadth vs Depth | Memory Becoming the Moat