Why are token leaderboards failing enterprise startups?

Token leaderboards produce waste, not productivity; reward output instead. This perspective reveals that most companies are measuring the wrong success metric in their AI initiatives. By focusing on token volume rather than actual outcomes, organizations waste resources on unproductive optimization. Enterprises default to tracking token generation simply because it's easily measurable, ignoring more important metrics like code quality, feature velocity, and business impact. The fundamental issue is misaligned incentives—what gets measured becomes the focus, even when those metrics don't reflect true productivity or value creation.

What has changed about the coding bottleneck with AI adoption?

AI has fundamentally shifted where the coding bottleneck exists in enterprise development. Most companies are still optimizing the wrong phase of the development cycle, failing to capitalize on where AI has actually moved the constraint. Rather than focusing on token generation or raw output speed, organizations should identify their new bottleneck and optimize around that critical point. Understanding this shift is essential because continuing to optimize outdated phases wastes valuable resources without generating meaningful productivity gains or building sustainable competitive advantage.

What is the next critical engineering role for AI-driven startups?

The next critical engineering role is 'developer experience for agents'—yet most companies haven't built this position yet. This role focuses on making AI agents easier to use, more reliable, and better integrated into development workflows. As AI moves from experimental to production status, organizations desperately need engineers dedicated to the human experience of working with these systems. This emerging role represents a fundamental gap in how companies are structured to support AI development, filling a need similar to how developer experience became essential for traditional platform companies.

Is engineering talent or model improvements more valuable for enterprise startups?

Model advantages vanish in weeks; engineering talent compounds indefinitely. This asymmetry represents the most important insight for enterprise startups making resource allocation decisions. As AI capabilities rapidly commoditize and new models emerge constantly, competitive edge from any specific model disappears quickly. However, talented engineers who understand system architecture, recognize limitations, and build lasting solutions become exponentially more valuable over time. Therefore, startups should prioritize hiring and retaining exceptional engineering talent far above chasing marginal model improvements.

Technology & the Future

Why Token Maxing is Failing Enterprise Startups | Legora CTO

The Twenty Minute VC

Hosted by Unknown

58 min episode

9 min read

5 key ideas

June 6, 2026

Listen to original episode

Token leaderboards are creating waste, not productivity — and the next critical engineering role doesn't exist yet at most companies.

In Brief

Token leaderboards are creating waste, not productivity — and the next critical engineering role doesn't exist yet at most companies.

Key Ideas

Companies chase yesterday's coding bottleneck

AI moved the coding bottleneck — most companies are still optimizing the wrong phase.

Reward shipping not token consumption

Token leaderboards produce waste, not productivity; reward output instead.

Agent developer experience role emerging

The next big engineering role: 'developer experience for agents' — nobody has built it yet.

Quick prototype then customer synthesis

PMs should vibe-code prototypes, then immediately return to customer synthesis.

Talent compounds while models fade

Model advantages vanish in weeks; engineering talent compounds indefinitely.

Why does it matter? Because the bottleneck moved — and most teams are still pushing on the wrong wall.

AI compressed code writing to near-zero cost. What Legora CTO Jacob Lorentsson reveals is that most organizations haven't updated their mental model: they're still throwing resources at the phase that's now essentially free, while the real constraints — product discovery upstream, code review downstream — get starved.

Coding is no longer the rate limiter; product work and code review are the new bottlenecks
Token leaderboards produce waste, not velocity — and enterprise CEOs are actively being sold this bad advice
The engineering role nobody has built yet: developer experience for agents
Given exclusive access to a better model vs. better engineers for six months, he'd take the engineers — model advantages evaporate in weeks

Code is now the cheap part — and most companies are still treating it like the constraint

Code writing is now essentially free, and most organizations haven't updated their mental model to account for it. Legora's Jacob Lorentsson — whose company hit $100M ARR in 18 months — describes three phases of software development: product work, coding, review. "Number two was the primary bottleneck for the past 100 years almost. The rate limiter was how quickly can you write code? That is now super cheap."

What's left are the two phases flanking it. Upstream: product work — translating customer pain into something buildable, doing the synthesis, making the taste calls. That hasn't gotten cheaper. Downstream: code review — AI can generate PRs faster than any human team can responsibly merge them. "If you believe that code is cheaper to write, then naturally the two other things are bottlenecks."

The uncomfortable implication: every company celebrating shipping speed without asking what's actually slow now is optimizing a solved problem. Lorentsson is blunt about where attention should go — constantly asking what the constraint on velocity is, then solving that. Right now, almost nobody is asking the question.

Token leaderboards are being sold to enterprise CEOs — and they produce waste, not productivity

Enterprise boards are being advised to track token consumption and bring it up in performance reviews. Lorentsson calls what happens next by name: token maxing. "People just burn tokens just to look good. That's a really stupid way to do anything."

The replacement is concrete: "Do hack days, do demos. Have people show everyone else how efficient they are and how much better they're doing. Reward them for being effective and efficient and having more output" — not for consumption. The metric is what ships, not what gets processed.

On the budget question itself — what percent of developer salary would you spend on AI tooling? — his position is nearly infinite, but the logic is opportunity cost, not profligacy. "The cost of not doing it is extremely high. This almost outweighs any sort of token cost." At Legora's growth rate, any efficiency gain justifies the spend. That's not a blank check for waste; it's an argument that the wrong question is "how much are we spending on tokens?" The right question is "what are we leaving on the table by not using AI here?" Those two questions lead to very different cultures.

The most valuable engineering role of the next five years barely exists yet

The obvious evolution of the engineering job is systems design — one abstraction above the code, thinking about what the system looks like, where to invest for reuse, what bets to make across different surfaces. Legora is already there. The less obvious shift is one level above even that.

Legora has a developer experience team — three people, which Lorentsson calls too few, launched later than it should have been. They build the infrastructure that makes engineers move fast: local dev setup that spins up instantly, a background coding agent letting each engineer run 10 concurrent agents, custom review bots that monitor CI and only escalate to a human when real judgment is required. "They're building features so that it can wait in CI and wait until everything looks green and all the reviews are good and then raise it to a human."

The next version of that team is for agents. "We kind of need to have the same team for agents — how do we make agents really really effective? How do we make sure we can enable agents to independently self-improve the system?" The goal is a closed loop: set the guardrails, pipe in the right data, let agents run experiments and optimize on their own. Right now, almost nobody has staffed this. The companies that do first will run circles around engineering orgs three times their size.

PMs can now code — and doing so is an opportunity cost disaster

PMs spending half their time on engineering are quietly destroying the function that's now the actual bottleneck. Product-engineering convergence sounds efficient. At developer-tooling companies where engineers are their own users, it might even be true. At Legora, it actively slows things down.

"The bottleneck is no longer coding, which means the bottleneck is the product work." Redirecting PM time toward building means starving the highest-value activity in the org. "If your PMs are coding a lot, if they're spending 50% of their time coding, we're missing out on so much product work" — talking to customers, synthesizing what they think, making prioritization calls.

There's a narrow carve-out: high-fidelity prototyping to compress handover friction. A PM who vibe-codes a precise prototype — this is exactly what it looks like — and immediately hands off saves the back-and-forth loop where engineers build the wrong thing. "It's good if PMs do some amount of vibe coding to show very high fidelity. Here's a prototype, this is exactly what it looks like — then handover." The word "then" is doing all the work. The moment prototyping becomes a sustained activity rather than a handoff technique, the bottleneck gets worse, not better.

Code review tooling is fundamentally broken — and the attack surface is scaling faster than the defense

Legora reviews every human-submitted PR. Every single one, even though Lorentsson calls it inefficient and wants to change it. The reason is not paranoia — it's that current tools don't classify risk correctly.

"I keep telling people at all events: if you're going to do a startup, please do something that solves the review thing." Today's tools surface line counts and diff sizes. What actually matters is architectural impact: "What's the impact on systems architecture? What's the impact on systems design stability, security boundaries? How does it take our system in the right direction? If that doesn't change, then maybe you don't have to review it at all. But if there are strategic trade-offs, you want a human to say yes, this is the right direction."

The attack side has already made this adjustment. Threat actors are running agents now — "they can try so many different things and they can keep running at it." The defense hasn't caught up. The winning product in this category routes PRs by architectural risk tier, auto-merges everything that doesn't touch system boundaries, and only surfaces decisions that require judgment. Nobody has built it. The category is wide open.

Vibe-coding your SaaS stack is now viable — for exactly one category of tool

A public company CEO's chief of staff took three weeks off and rebuilt Copper from scratch. It works. That story illustrates where the build/buy line just moved.

Lorentsson frames it along two axes: surface area and depth. Wide and shallow — lots of features, not much complexity hidden beneath, heavy customization required from your team — is now a build candidate. "If it's a shallow app and it requires a lot of customization from you, maybe you just build it. That's probably actually the right thing to do."

Deep and narrow — payroll, compliance, anything with years of edge cases baked in — still belongs in the buy column. "If it's a very deep one, there's just too much stuff for you to build and it's not viable." Legora is auditing its internal stack through this lens: HR system, ATS, onboarding workflows. Some they're building. Others they're not. They vibe-coded a Canadian immigration tracker in a day that saved an entire team of relocating employees weeks of research. The threshold has moved. Most SaaS tools were over-built for the customization you actually needed, and the gap between what you need and what it costs to build in-house collapsed from months to days.

Model access or engineering talent for six months — he takes the engineers, no hesitation

"Engineers, for sure." One word, then the reason: models change biweekly. Anthropic leads, then OpenAI, then back. Any six-month model advantage disappears before you've fully built on top of it.

Engineering talent compounds. "If you have really good engineers, you can build a system that exponentially improves — and that's worth a lot more." Legora runs roughly 10 models simultaneously, routing each task by latency and performance, evaluating constantly. "The best model changes biweekly almost." No single model lead survives long enough to become a moat.

The actual value lives above the model layer: routing logic, primitives built specifically for legal workflows, enterprise features, the team that can absorb whichever model wins the next benchmark. "It's not much less than most people think. The value of Legora is so much more around it." Take any one model away and Legora's customers still choose Legora. That's the signal. A model advantage evaporates. A system built to compound on top of whatever model wins — that's the only durable answer.

The leverage point has migrated — most orgs haven't noticed where it landed

What this episode maps, underneath all the specifics, is a complete restructuring of where engineering leverage actually lives. It's no longer in code. It's moved into product synthesis, architectural judgment, and — least appreciated of all — the infrastructure that makes agents effective. The developer experience team for agents is the canary: nobody's built it seriously yet, and the companies that do will outrun teams three times their size. Speed now belongs to whoever asks the right question about the bottleneck. Most teams are still in the wrong meeting about it.

Topics: AI productivity, engineering leadership, enterprise software, developer tooling, product management, token optimization, code review, vibe coding, internal tools, hiring, LegalTech, agent orchestration

Frequently Asked Questions

Why are token leaderboards failing enterprise startups?: Token leaderboards produce waste, not productivity; reward output instead. This perspective reveals that most companies are measuring the wrong success metric in their AI initiatives. By focusing on token volume rather than actual outcomes, organizations waste resources on unproductive optimization. Enterprises default to tracking token generation simply because it's easily measurable, ignoring more important metrics like code quality, feature velocity, and business impact. The fundamental issue is misaligned incentives—what gets measured becomes the focus, even when those metrics don't reflect true productivity or value creation.
What has changed about the coding bottleneck with AI adoption?: AI has fundamentally shifted where the coding bottleneck exists in enterprise development. Most companies are still optimizing the wrong phase of the development cycle, failing to capitalize on where AI has actually moved the constraint. Rather than focusing on token generation or raw output speed, organizations should identify their new bottleneck and optimize around that critical point. Understanding this shift is essential because continuing to optimize outdated phases wastes valuable resources without generating meaningful productivity gains or building sustainable competitive advantage.
What is the next critical engineering role for AI-driven startups?: The next critical engineering role is 'developer experience for agents'—yet most companies haven't built this position yet. This role focuses on making AI agents easier to use, more reliable, and better integrated into development workflows. As AI moves from experimental to production status, organizations desperately need engineers dedicated to the human experience of working with these systems. This emerging role represents a fundamental gap in how companies are structured to support AI development, filling a need similar to how developer experience became essential for traditional platform companies.
Is engineering talent or model improvements more valuable for enterprise startups?: Model advantages vanish in weeks; engineering talent compounds indefinitely. This asymmetry represents the most important insight for enterprise startups making resource allocation decisions. As AI capabilities rapidly commoditize and new models emerge constantly, competitive edge from any specific model disappears quickly. However, talented engineers who understand system architecture, recognize limitations, and build lasting solutions become exponentially more valuable over time. Therefore, startups should prioritize hiring and retaining exceptional engineering talent far above chasing marginal model improvements.

Read the full summary of Why Token Maxing is Failing Enterprise Startups | Legora CTO on InShort

App Store Google Play

Why Token Maxing is Failing Enterprise Startups | Legora CTO

In Brief

Key Ideas

Companies chase yesterday's coding bottleneck

Reward shipping not token consumption

Agent developer experience role emerging

Quick prototype then customer synthesis

Talent compounds while models fade

Frequently Asked Questions

Related Episodes

Are OpenAI & Anthropic Overvalued? How Token Costs Will Fall 10X & Usage Will Explode 100X

206,000 COVID Tests in a Day & $5BN in Revenue | Curative Co-founder & CEO

Wix Founder: What Wall St Gets Wrong About AI & Wix | Will Base44 Win the Vibe-Coding Wars?