Lenny's Podcast cover
Technology & the Future

Why AI came for coders first, automation timelines, and how we’re inside the AI inflection

Lenny's Podcast

Hosted by Unknown

1h 40m episode
12 min read
5 key ideas
Listen to original episode

Mid-career engineers face the highest AI displacement risk — not juniors, not seniors — because depth is what turns AI from autocomplete into a force…

In Brief

Mid-career engineers face the highest AI displacement risk — not juniors, not seniors — because depth is what turns AI from autocomplete into a force multiplier.

Key Ideas

1.

Verifiable domains face AI disruption first

AI came for coders first because code is verifiably right or wrong — other professions are next.

2.

Cognitive load limits agent parallelization

Running four parallel agents by 11am exhausts even 25-year veterans — cognitive limits are the new bottleneck.

3.

TDD unlocks professional-grade agent output

'Red/green TDD' — two words that unlock professional-grade agent output every time.

4.

Contain blast radius, not causes

The lethal trifecta can't be blocked; limit blast radius instead of building more guardrails.

5.

Expertise depth shields against displacement

Mid-career engineers face more displacement risk than juniors or seniors — depth is the AI multiplier.

Why does it matter? Because the engineers building hardest right now are wiped out by 11am — and that's the signal that software development just broke open

Something fundamental shifted in November 2025 — not gradually, but across a threshold — and most people outside of software haven't felt the reverberations yet. Simon Willison, co-creator of Django and one of the sharpest practitioners writing about agentic engineering today, lays out exactly what changed, who it threatens, and how the best builders are already working differently.

  • A reliability crossing at GPT-5.1 and Claude Opus 4.5 turned coding agents from "mostly works" to "almost always works" — and that gap changes everything
  • The engineers extracting the most value from AI are the most cognitively depleted, not the most leisured
  • Mid-career engineers face sharper displacement risk than either juniors or seniors — depth is the multiplier AI can't fake
  • Prompt injection can't be solved the way SQL injection was; the only real defense is limiting blast radius, not building more guardrails

A November 2025 threshold crossing made previously impossible software projects trivially fast — overnight, not gradually

"We went from that to almost all of the time it does what you told it to do, which makes all of the difference in the world."

GPT-5.1 and Claude Opus 4.5, arriving in November 2025, weren't just incrementally better — they crossed a reliability threshold that restructured what's worth building. Simon explains that before November, coding agents would "mostly work" but required close supervision; after, you could spin up an agent and get back something functional without babysitting every step. Engineers who took holiday time off in late 2025 came back in January and February to a tool that had quietly become a different product.

The two forces behind the leap: Anthropic and OpenAI spent all of 2025 pointing their reinforcement learning efforts squarely at code, and reasoning models — first introduced with OpenAI's o1 in late 2024 — proved especially powerful for tracing bugs and validating logic. The combination made the agents not just faster but reliably correct in a way that unlocks a different calculus entirely.

Simon's framing: code is an easier domain than most because it's "obviously right or wrong" — the program runs or it doesn't. That verifiability is precisely why AI came for software engineers first. Other knowledge work — legal briefs, essays, strategic plans — lacks that clear feedback loop. The software world is, in his words, "a bellwether for other information workers." What engineers are figuring out now about career survival, team structure, and quality assurance is the preview for everyone else.

Running four parallel agents by 11am exhausts even 25-year veterans — cognitive limits, not code output, are the new bottleneck

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer. I can fire up four agents in parallel and have them work on four different problems. By 11 a.m., I am wiped out."

The promise of AI-as-leisure has it exactly backwards. Simon is blunt: the people extracting the most value from these tools are working harder than they ever have, not less. The bottleneck has shifted from producing code to directing, evaluating, and orchestrating — tasks that consume high-level cognitive capacity at a rate that typing code never did.

He describes a second symptom: engineers losing sleep because agents could theoretically be running while they rest, staying up an extra half-hour to fire off tasks, waking at 4am. "That's obviously unsustainable," he says. He hopes it's a novelty reaction to tools that only got genuinely good four or five months ago, but he flags a real parallel to gambling mechanics — the variable reward of watching an agent surprise you keeps pulling attention back.

The structural implication is that experienced engineers aren't becoming obsolete — they're becoming the scarce input. Simon's 25 years of pre-AI knowledge is what he amplifies; a newer engineer has no such base to multiply. His time estimates for projects have been completely invalidated, which is disorienting but also liberating: tasks he would have dismissed as two-week efforts now take twenty minutes because the crafty implementation details the AI handles were the only thing making them expensive. The advice he gives himself: set a New Year's resolution to be more ambitious, not less.

StrongDM's 'dark factory' — no one writes or reads code, AI agents test against AI-simulated end users 24/7 at $10,000/day — is already live

StrongDM, a security software company for access management, began running a "nobody reads the code" policy in August of last year. What they built to make that work is the clearest existing blueprint for the next order-of-magnitude in software development.

The core innovation: a swarm of AI agent testers simulating end users around the clock — agents playing fake employees in a fake Slack channel, asking for access to Jira, generating the kinds of requests real users would make. Cost: roughly $10,000 a day in tokens. The QA department, rebuilt as a thing that never sleeps.

But rate limits on real Slack and Jira made full-scale simulation impossible, so StrongDM went further — they had their coding agents build simulated versions of Slack, Jira, Okta, and every other platform they integrate with, using public API documentation and open-source client libraries. The result was a lightweight Go binary sitting on a server, with a vibe-coded fake Slack interface so engineers could see what was happening. Simon attended a demo in October and the detail that stuck: this security-adjacent company, building software that manages employee access permissions, chose this approach — not despite understanding the risks but precisely because they understood them deeply.

Simon's read: the QA function is the next piece of the pipeline to be automated. The pattern is available now for any team willing to run the experiment.

AI amplifies senior engineers exponentially and accelerates junior onboarding — mid-career engineers are the ones actually at risk

Thoughtworks convened a group of engineering VPs roughly a month before this episode and arrived at a counterintuitive conclusion: the population most threatened by AI isn't new engineers or senior ones. It's the people in between.

Senior engineers like Simon have 25 years of pre-AI knowledge to amplify — the depth that lets them prompt at a high level, use sophisticated engineering vocabulary the models already understand, and spot in a sentence which problems an agent will solve cleanly versus which will spiral. That depth is the multiplier. Without it, you're just giving instructions to a system you can't evaluate.

New engineers, meanwhile, are actually benefiting. Both Cloudflare and Shopify announced plans to hire a thousand interns over 2025, specifically because the onboarding timeline collapsed from a month of uselessness to a week. AI assistants compress the ramp-up that used to filter out early-career hires.

The middle — engineers who've spent years accumulating skills that aren't yet senior depth but long since graduated from beginner boosts — faces the sharpest displacement. They don't have the base to amplify, and they've already absorbed the acceleration benefits that are keeping juniors productive.

Simon's practical advice for anyone in that band: lean into the technology as a learning tool rather than a replacement, and aggressively pursue problems requiring architectural judgment, cross-domain synthesis, or domain expertise — precisely the skills an LLM cannot generate from first principles.

The 'lethal trifecta' can't be blocked — every personal AI assistant is insecure by design, and the only real fix is limiting blast radius

Simon coined the term prompt injection in 2022. He now partly regrets it — because it implies the same fix as SQL injection, which is solved. Prompt injection is not solved and may be unsolvable in any meaningful sense.

The lethal trifecta is his sharper framework: any agent that simultaneously has access to private information, is exposed to malicious instructions, and has an exfiltration mechanism is fundamentally compromised. The email assistant example is canonical — someone emails your AI assistant claiming you authorized them to receive the latest sales projections, and the assistant forwards them. Every leg of that trifecta is present in every useful personal assistant.

"You cannot deny every one of these attacks because I can always invent a new sequence of characters that might trick the model." Filtering "ignore previous instructions" in English doesn't catch it in Spanish, or encoded, or rephrased. Detection scores at 97% are a failing grade when attackers just keep retrying. Getting to 100% would require a formal proof that no sequence of text could override instructions — and Simon can't imagine what that proof looks like.

The constructive path: cut off one leg of the trifecta, almost always the exfiltration mechanism. Design so that even a successful injection can't export data. Simon cites a Google DeepMind paper proposing a privileged/quarantined agent split where tainted instructions trigger human review before high-risk actions execute. He's not seen good implementations yet, but the architecture points in the right direction. His personal approach with Claude Code for Web: run it on Anthropic's servers so a successful attack can waste Anthropic's compute, not his.

Two words — 'red/green TDD' — unlock professional-grade agent output; the gap between amateurs and experts is compressed technical vocabulary, not better tools

The single most important practice when working with coding agents is getting them to run the code. An agent that writes code without executing it puts you back to copy-pasting from ChatGPT and hoping — you lose the entire benefit of agentic engineering.

Test-driven development solves this. If a repository already has tests, agents will write more tests. They'll run them, catch syntax errors, and build new features without breaking old ones. Simon says dropping tests in exchange for development speed is "a huge mistake" — the tests are what allow you to move fast without constant manual verification.

The specific unlock: "red/green TDD" — two words of programming jargon that instruct an agent to write the test first, watch it fail (red), implement the fix, watch it pass (green). That full paragraph of instructions compresses to five characters. The agents know what it means. This is, Simon argues, the actual competitive gap between experienced and inexperienced AI users: not access to better models, but a vocabulary of compressed technical prompts that route agents into reliable, high-quality behavioral patterns.

A related bonus: Simon no longer minds verbose test suites. His small libraries now regularly have over 100 tests — previously that would signal over-engineering, but updating a thousand lines of tests is now the agent's job, not his. Code is cheap. Tests are cheap. The constraint has moved entirely.

Start every project with a skeleton that has one test — agents pattern-match from examples, not prose instructions

Some engineers swear by CLAUDE.md files — paragraphs of text describing preferred working style. Simon doesn't use them. His alternative is more efficient and more reliable: a thin project template containing a single test that checks 1 + 1 = 2.

"Coding agents are phenomenally good at sticking to existing patterns in the code. If you give them a codebase that already has just a single test in it, they will write more tests." A single file is enough to establish preferred indentation, formatting, structural choices — anything you want the agent to infer and match. Showing once beats telling repeatedly.

His templates are up on GitHub: one for a Python library, one for a dataset plugin, one for a command-line tool. The investment is roughly 20 minutes. The compounding return is every subsequent project in that template inheriting his quality standards without re-prompting.

193 public tools and a GitHub research corpus that agents can directly consume — Simon's 'knowledge hoarding' system turns past work into permanent leverage

Simon has spent his career building what he calls a hoard: a retrievable corpus of things he's tried, things that worked, things that didn't. Two GitHub repositories operationalize this.

simonw/tools holds 193 small HTML and JavaScript tools — client-side single-file applications, each capturing one idea or technique. simonw/llm-research holds AI-driven research projects where coding agents wrote code, ran it, and generated markdown reports. The key distinction from a folder of deep-research PDFs: these are verified, executed experiments. The agent wrote code and ran it. That transforms the repository from LLM output into something actionable.

The workflow in practice: "I'll say things like check out simonw/research from GitHub and look at the ones in there that deal with WebAssembly and Rust, and then use that to feed into solving this new task." An early example — combining a PDF-rendering tool and a Tesseract OCR tool into a PDF-OCR application — took a single prompt to Claude because both source experiments were already in the corpus.

"It's hard to overstate how good these things are at reusing context that you can make available to them." Modern coding agents can search an entire hard drive worth of material to find the specific examples they need. The implication: every AI-assisted project you complete is an investment in future leverage, but only if it's findable and runnable.

Code came first because it's verifiable — every other knowledge profession is next, and the timeline is faster than anyone expects

Software engineering was the first domain to feel this shift because code has a clean answer: it runs or it doesn't. Legal briefs, marketing copy, and strategic plans lack that feedback loop, which is why agents are still messier there. But the reliability curve is climbing across all of them, and the engineers working through the cognitive exhaustion, the dark factories, and the security trifectas today are the preview.

The next phase won't announce itself gradually. It will cross a threshold.


Topics: AI coding tools, agentic engineering, software development, prompt injection, AI safety, developer productivity, career advice, automation, vibe coding, Claude Code, LLM benchmarks, dark factory pattern, test-driven development

Frequently Asked Questions

Why did AI come for coders first?
AI came for coders first because code is verifiably right or wrong. This binary nature of code allows AI systems to evaluate their own outputs with precision, turning them into effective collaborative tools for software engineers before other professions. Code's objective correctness eliminates the friction of human judgment in validation, enabling rapid iteration and refinement. This differs fundamentally from law, medicine, creative work, or consulting—where correctness depends on context, precedent, and subjective judgment. The verifiable nature of code explains why automation impacted engineers first and why other professions face different challenges.
Why are mid-career engineers most at risk from AI displacement?
Mid-career engineers face the highest AI displacement risk—not juniors, not seniors—because depth is what turns AI from autocomplete into a force multiplier. Juniors lack foundational expertise to leverage AI's capabilities effectively, while seniors possess irreplaceable architectural knowledge and mentorship authority. Mid-career engineers occupy the vulnerable middle: they've developed enough depth to enable AI-augmented productivity gains, yet lack the rare skills and leadership roles protecting seniors. Additionally, cognitive exhaustion from managing multiple AI agents simultaneously—running four parallel agents by 11am exhausts even 25-year veterans—accelerates burnout. This displacement window makes depth paradoxically their greatest vulnerability.
How does red/green test-driven development improve AI agent output?
'Red/green TDD'—two words that unlock professional-grade agent output every time. This test-driven development methodology, where engineers write failing tests first, then code to pass them, provides verifiable frameworks that align with how AI systems operate. Clear specifications exist before implementation, making agent outputs testable and reproducible. By structuring work around the red/green cycle, engineers create explicit success criteria that AI can target and validate against. The approach eliminates ambiguity in requirements, reduces hallucination risks, and creates measurable checkpoints. This technique represents a critical skill bridge between traditional engineering and AI-augmented development workflows.
What is the cognitive bottleneck when using multiple AI agents?
Running four parallel agents by 11am exhausts even 25-year veterans—cognitive limits are the new bottleneck. Managing multiple autonomous AI agents requires constant context-switching between different tool outputs, verification tasks, and problem-solving, which depletes mental energy faster than traditional solo coding. The constraint isn't compute power or system capability; it's human attention and working memory. Veterans hit this wall because they simultaneously track agent outputs, evaluate correctness, adjust prompts, and maintain architectural coherence across multiple parallel streams. This cognitive ceiling explains why productivity gains from AI plateau faster than organizational expectations, revealing that human cognition—not tool capability—is the limiting factor.

Read the full summary of Why AI came for coders first, automation timelines, and how we’re inside the AI inflection on InShort