
Tokenmaxxing: How Top Builders Use AI To Do The Work Of 400 Engineers
Y Combinator Startup Podcast
Hosted by Unknown
Professional engineers ship just 30–50 lines of tested code per day — which is exactly why Gary Tan built in 5 days what once took 18 months and $4 million.
In Brief
Professional engineers ship just 30–50 lines of tested code per day — which is exactly why Gary Tan built in 5 days what once took 18 months and $4 million.
Key Ideas
Tested code compounds engineering velocity
Professional engineers ship 30–50 lines of tested code daily — that's why 400x is real.
Token maximization becomes economic necessity
Token maxing is like SF rent: it's expensive not to do it.
Document judgment outside executable code
Fat skills, thin harness: judgment lives in Markdown, not code.
Tests prevent AI-generated code decay
No tests = AI-generated slop 10x worse than hand-written code.
Control tools or be controlled
The defining question of this era: do you control your tools, or do your tools control you?
Why does it matter? Because the 400x claim is math, not marketing.
Gary Tan rebuilt a full-featured blog platform in five days for $200. The first time it cost $4 million and 18 months. The gap isn't explained by better tools alone — it collapses the moment you look at what professional engineers actually ship on an average day.
- The 400x figure survives git-repo analysis because the honest baseline — tested, production-ready code — runs to just 30–50 lines per engineer per day
- Token maxing (deliberately throwing maximum context and compute at a problem) is the highest-leverage habit any technical founder can build right now
- "Fat skills, thin harness": human judgment lives in Markdown prompts, deterministic side-effects belong in code — most agentic systems invert this and break
- AI-generated code without 80–90% test coverage is "10x worse than human-written" — the productivity gains vanish the moment real users touch it
30–50 lines of tested code per day is the honest baseline — and it's why 400x is real
Tan's original claim was 100x. After running git analysis on his actual output, the number went up.
The correction wasn't what the internet expected. He'd benchmarked his Claude-assisted output against his 2013 coding pace — and after stripping to logical lines of production-ready code, both numbers shifted. His 2013 baseline dropped by 70% (lower than he'd assumed), while the AI-assisted count climbed. "If you look at the literature about software engineering going back to like 2000, 1990, it's pretty clear that the average number of lines of code that a professional software engineer — that's tested and production ready — it's not like 100 lines of code. It's like 50. It's like 30. Like a day." Tan's own 2013 rate was around 14, since he was coding part-time. That's the denominator. Against those numbers, directing 15 concurrent agents stops feeling like hyperbole.
The argument tangled online because Tan led with the multiplier rather than the baseline. He admits that was a mistake. But the substance holds: most people benchmarking their AI output compare it to their best human coding days — in-flow, focused, minimal interruption. That's the wrong comparison. The average day includes meetings, reviews, context-switching, and the cognitive overhead of holding a whole system in your head. Measure against that, and 400x isn't a flex. It's just what happens when you stop writing the code yourself.
Token maxing is expensive — but not doing it costs more
Twenty sources are available. Most builders consult one.
That single-source default is what Tan calls the invisible tax on every decision made with incomplete context. Building the agentic newsroom behind Gary's List, he didn't stop at Perplexity. He hit Perplexity's API, X's API, Grok's API — cross-referencing 20 sources, tracking where 13 agree and 7 dissent, feeding the full picture into the core prompt. "We don't just settle for one source when we can get 20 sources and we can cross-reference them. We can figure out, well, these 13 sources say this and seven sources disagree with that." That's not incrementally better output. It's a qualitative shift in what gets built and what gets decided.
The analogy that clicked for him is San Francisco rent. YC founders routinely resist paying thousands per month for an apartment in the right neighborhood. The counterintuitive pitch: it's expensive not to be there. The serendipity compounds. Token maxing works identically. Spending $500 in a single day on tokens looks absurd until you map what you gave up by cutting context: 80% quality outputs compounding across every product decision, every architectural call, every research question.
"Token maxing is going to be one of those things for founders that we sort of have to teach them, where it's not immediately obvious that you shouldn't. This is actually like rent."
The cost is visible. The quality delta from inadequate context is invisible — until users hit the 20% you missed.
Judgment belongs in Markdown, not code — most agentic systems fail because builders invert this
Most agentic systems don't break because the LLM is inadequate. They break because the builder tried to encode intent in code.
Tan frames it through wedding planning. If you're writing a runbook so someone else could repeat the event, everything requiring judgment, context, and edge-case handling goes into plain-English instructions. The call to 20 venues — deterministic, repeatable — gets wired to Twilio. "All of the difficulty in agentic engineering today is when people try to do things that should be in Markdown in code, and it fails because code is brittle. It doesn't understand special cases. Code literally doesn't understand what you want or who you are. It is executing deterministic zeros and ones in a Turing complete loop."
The principle — "fat skills, thin harness" — came partly from Tan getting trolled online for writing Markdown instead of "real code." His rebuttal: Markdown is code. It compiles differently. An LLM with a well-written skill file handles novel inputs, infers intent, and recovers from edge cases that would crash any equivalent if-statement.
The practical call before writing any agentic logic: does this require understanding intent and handling novel situations? Markdown. Is this a deterministic action with a known output? Code. Getting that split wrong is the most reliable way to build an agent that works in demos and collapses in production.
Own your prompts or live below someone else's API line
The Apple I was a breadboard in a wooden case held together with nails and duct tape. That was the moment of maximum leverage in personal computing — and capturing it required building the machine yourself.
Tan draws the parallel directly: getting a personal AI agent running today costs $500–$1,000 in tokens and cloud, plus a few hours for someone technical. That's a threshold, not a wall. And the people who cross it land somewhere different. "Unless you have your own prompts and you can write it for yourself, you are below the API line for some PM or developer that is not you who will not understand you, will not understand your needs, will not understand what you uniquely care about."
The binary on the other side of that threshold isn't subtle. You either have your own AI — your own data, integrations, prompts written to your specific context — or you have something closer to a Facebook feed: an algorithm written by someone else, optimized for someone else's business model. "Will you have control over your own tools or will your tools have control over you?"
That question is being answered right now, by every founder choosing whether to write their own prompts or accept the defaults someone shipped.
Claude Code is a Ferrari that breaks down when you need it most — plan to be a mechanic
The exhilaration is real. So is the breakdown on the side of the road.
Tan's enthusiasm about Claude Code is genuine: it figures things out you'd never expect a machine to figure out, and does it fast. Then he pivots immediately: "It's also like a Ferrari in that you better be a mechanic. It's a Ferrari that will break down on the side of the road when you most need it, and you need to get out with your wrench and pop the hood and fix it."
The people who find AI coding frustrating and those who find it transformative are mostly running the same model. The difference is almost entirely the mechanic mentality. Homebrew Computer Club members didn't evaluate the Apple I against a fantasy of seamless computing — they decided whether they wanted to learn how the machine worked. The current moment is structurally identical.
Don't pick your AI coding setup based on how well it runs when it's running. Evaluate how recoverable the failure modes are, and whether you're willing to build the debugging and steering skills that make the tool actually go fast. The Ferrari metaphor isn't a caveat. It's the point.
Claude keeps writing incomplete code until you ask for the ASCII diagram first
Tan kept hitting the same failure: architecturally plausible code that wasn't quite complete. Not wrong, exactly — just not boiled-ocean enough. The fix he stumbled into was a mandatory pre-coding step: "Before you start your work, make an ASCII diagram of all the data flows, all the inputs and outputs. What are the user flows? What are the error messages?"
Forcing the model to map everything before writing anything causes it to load full context before execution begins. State machines, dependency graphs, decision trees — drawn first. "Once it did that, it loaded all of the context in and then it just did the work more completely. Like it boiled the ocean better."
This wasn't from a research paper. Tan found it empirically, after too many incomplete implementations. Add the ASCII step to every Claude Code session before a single line of implementation. It costs 30 seconds.
Skip tests and AI-generated code becomes 10x worse than hand-written — all the speed gains evaporate
Your AI-generated code is probably 10x worse than hand-written right now. Because you almost certainly skipped tests.
Tan hit the vibe coding wall the standard way: code worked for the 80% case, users touched it, it fell over. "If it's not tested and you're just throwing users in there, it's slop. 10x worse than human-written code because you just have no idea what's going to happen." Human-written code gets caught by the author's intuition about edge cases during writing. AI-generated code doesn't carry that self-monitoring. Without a test suite, there's no visibility into what breaks under real conditions.
The machine will write the tests if you require them. Tan landed at 80–90% coverage as the practical target after briefly aiming for 100% and finding it too brittle to maintain. Every hour saved by skipping tests is borrowed against a production debugging session that arrives faster with AI-generated code than hand-written. Require the tests. The machine doesn't mind.
The ceiling isn't the model — it's whether you've built the skills to direct one
Every principle here is about expanding the human's ability to steer, not waiting for the AI to improve. The tools are already capable of order-of-magnitude output. Token maxing, fat skills, ASCII diagrams, test coverage — none of these wait on a smarter model. They're directing habits.
The gap between founders who capture the 400x and those stuck at 1.2x will be explained almost entirely by whether they treated the mechanic work as the actual job. The builders who thrive aren't the ones who found the smoothest tool. They're the ones who stopped waiting for the Ferrari to drive itself.
The machines are ready. The question is whether you are.
Topics: AI coding, Claude Code, token maxing, agentic engineering, developer productivity, personal AI, YC, open source, prompt engineering, startup tools
Frequently Asked Questions
- What is tokenmaxxing about?
- Tokenmaxxing is a development methodology that enables builders to dramatically accelerate software engineering productivity using AI. "Professional engineers ship 30–50 lines of tested code daily — that's why 400x is real." Gary Tan demonstrated this principle by building in 5 days what once took 18 months and $4 million. The core insight is that "Token maxing is like SF rent: it's expensive not to do it"—ignoring this approach in modern development becomes prohibitively costly. This shift fundamentally changes how teams scale productivity through strategic AI application.
- What does tokenmaxxing reveal about how engineers should organize their work?
- "Fat skills, thin harness: judgment lives in Markdown, not code." This principle reveals how successful teams should organize their work—expert developers concentrate judgment and decision-making on specifications documented in Markdown, while AI handles routine implementation. This reframes the engineering role from writing every line of code to architecting solutions and managing AI execution. By concentrating human expertise where it matters most—high-level strategy and validation—teams achieve dramatic productivity gains. "Professional engineers ship 30–50 lines of tested code daily — that's why 400x is real" demonstrates the opportunity when judgment and automation align effectively.
- Why is testing critical in AI-assisted development?
- "No tests = AI-generated slop 10x worse than hand-written code." Testing distinguishes successful AI-assisted development from inferior results. Without rigorous validation, AI-generated code degrades dramatically in quality compared to manually written alternatives. Tokenmaxxing emphasizes that while AI can accelerate coding tasks, human expertise must validate the output through comprehensive testing. This protection ensures that the productivity gains from AI remain reliable and maintainable. The approach demonstrates that AI acceleration requires complementary human judgment—particularly in quality assurance and testing protocols—to deliver production-ready code.
- What fundamental question does tokenmaxxing raise about AI tools?
- "The defining question of this era: do you control your tools, or do your tools control you?" Tokenmaxxing forces developers to confront this critical issue—whether they actively direct AI to serve their strategic goals or passively accept what tools suggest. Successful implementation requires intentional mastery. Builders who control their tools achieve results like Gary Tan's 5-day delivery instead of 18 months. Those who surrender agency to tools risk poor outcomes. This tension defines modern engineering: maintaining human judgment and strategic control while leveraging AI's accelerating capabilities.
Read the full summary of Tokenmaxxing: How Top Builders Use AI To Do The Work Of 400 Engineers on InShort
