How does Anthropic use research previews to accelerate product development?

Anthropic 'ships in research preview' to 'cut commitment cost and compress feedback loops to days.' This approach allows teams to gather rapid user feedback without the overhead of full product launches. By maintaining lower commitment levels early, teams iterate quickly based on real-world usage patterns. This strategy proves especially valuable in fast-moving AI spaces where model capabilities change rapidly. The research preview model enables faster learning cycles and more informed decisions before larger investments, fundamentally shifting how product teams validate ideas.

What is product taste and why does it matter in AI development?

Product taste — deciding *what* to build — is 'the one skill that compounds as code cost approaches zero.' As AI engineering costs drop, the bottleneck shifts from technical execution to creative vision and strategic decision-making. With automation becoming cheaper, identifying which features matter most becomes increasingly valuable. This skill compounds because each decision informs better future ones. Product taste encompasses understanding user needs, anticipating market directions, and making tradeoffs that create coherent products rather than feature bloat.

What is the key difference between 2024 chat-based and Claude Code action-based models?

'The 2024 generation was chat-based; the Claude Code generation is action-based — that's the gap most people haven't crossed yet.' Chat-based models excel at conversation and text analysis, while action-based models directly execute code, take system actions, and integrate with external tools. This shift represents fundamental change in how users interact with AI—moving from asking questions to accomplishing tasks directly. Most product teams remain stuck in conversational design paradigms, unable to leverage action-based capabilities.

Why are automations that work 95% of the time considered problematic?

'An automation that works 95% of the time isn't an automation; it's a liability.' When systems fail in 5% of cases, users cannot rely on them for critical tasks and must maintain manual backup processes anyway. This creates false efficiency while requiring constant monitoring and exception handling. Truly useful automation requires near-perfect reliability so users can genuinely delegate tasks. This principle especially applies to code execution and system actions where failures are visible and disruptive.

Technology & the Future

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

Lenny's Podcast

Hosted by Unknown

1h 26m episode

11 min read

5 key ideas

April 23, 2026

Listen to original episode

Building for today's model, not tomorrow's, is the hardest PM skill — Cat Wu calls it being "the right amount of AGI-pilled.

In Brief

Building for today's model, not tomorrow's, is the hardest PM skill — Cat Wu calls it being "the right amount of AGI-pilled.

Key Ideas

Ship Research Preview to Accelerate Feedback Loops

Ship in 'research preview' to cut commitment cost and compress feedback loops to days.

Refresh Prompts as Model Capabilities Evolve

Review your full system prompt at every model launch — remove prompting crutches the new model has outgrown.

Product Taste Becomes Critical as Code Costs Fall

Product taste — deciding *what* to build — is the one skill that compounds as code cost approaches zero.

Unreliable Automation Is Worse Than Manual Work

An automation that works 95% of the time isn't an automation; it's a liability.

Most Haven't Crossed to Action-Based AI

The 2024 generation was chat-based; the Claude Code generation is action-based — that's the gap most people haven't crossed yet.

Why does it matter? Because the hardest PM skill right now isn't strategy — it's knowing exactly how dumb your model still is.

Cat Wu, Anthropic's head of product for Claude Code and Cowork, runs one of the fastest-shipping teams in tech. The secret isn't model access or headcount — it's a set of practices most product teams haven't copied yet. Walk away knowing:

Anthropic compressed feature timelines from 6 months to a single day through process design, not better tooling
The most underrated PM skill in 2025 is calibrating product to current model capability, not the AGI version you're imagining
Features you add to compensate for model weakness become dead weight — and should be deleted when the model outgrows them
An automation that works 95% of the time isn't an automation; it's a liability you haven't paid yet

Anthropic ships features in a day because of a Slack channel, not a smarter model

The timelines for a lot of Anthropic's product features have gone down from six months to one month and sometimes to one week or even one day. Cat Wu is direct about what caused it: not Mythos, not frontier model access, but a repeatable process the team built deliberately.

The mechanism is what Wu calls the evergreen launch room. When an engineer has a feature that's been dog-fooded internally, they post it there. Sarah, who leads docs, and Alex, who leads PMM, and Tar and Lydia on DevRel jump in and turn around the marketing announcement the very next day. The PM's job, Wu says, is to build this framework — defining when to pull in cross-functional partners and what their expectations are — so any individual engineer can go from idea to shipped without getting blocked.

Two other practices amplify this. First, nearly everything ships as a 'research preview,' explicitly branded so users know it's early and that support isn't guaranteed forever. This reduces commitment cost and makes it psychologically easier to push something out in a week. Second, the team runs weekly metrics readouts with the entire team — not just leadership — so every person understands the business well enough to make decisions without waiting on a PM sign-off.

The implication Wu draws is pointed: most teams assume speed requires better engineering capacity or more advanced models. The actual bottleneck is coordination overhead. Audit your launch process before you audit your stack.

Building for the AGI you wish you had is the most common PM mistake right now

It is very hard to be the right amount of AGI-pilled. That's Wu's framing, and she means it as a precise calibration problem.

Everyone can see the future where models are so smart you just need a text box — the model will add any tool, ask any clarifying question, handle any ambiguity on its own. Building for that version of the model is, paradoxically, the easy path. The hard thing, Wu says, is figuring out for the current model: how do you elicit maximum capability? How do you guide users onto the golden path? How do you patch the model's specific weaknesses in your product harness?

Her method for building this skill is concrete. When the model does something unexpected — say, making a front-end change and running tests but not actually verifying the UI — she asks the model to introspect on why. The model will often surface what misled it: something ambiguous in the system prompt, a sub-agent that didn't check its work, a task boundary that wasn't clear. That explanation tells you exactly what to fix in the harness.

She also keeps a short list of five trusted evaluators — users who can articulate precisely what makes a specific model-harness combination work or not — and treats their qualitative read as the fastest signal available, faster than any automated eval suite.

The skill set itself, she notes, changes every few months as coding capability jumps. The meta-skill is first-principles thinking: reading how the tech landscape is shifting, spotting what the team actually needs, and filling that gap regardless of job title.

Every time a new model ships, the right move is to delete features, not add them

A to-do list that Claude Code once required to complete large refactors — explicitly reminding the model to check every item before finishing — is now barely used. With Opus 4 and later models, Wu says, the model just naturally tracks its own completion without prompting.

This isn't a one-off. Wu describes it as a formal practice: every time Anthropic launches a new model, the team reads through the entire system prompt and asks, for each section, does the model still need this reminder? If not, they remove it. For earlier models, they had to keep prompting: hey, did you finish everything on the to-do list? For later models, without any prompting, it just naturally thinks to do everything on the list.

The same logic runs in reverse for new features. Code review is the example Wu gives. Anthropic tried to build a serious code review product multiple times, shipped simpler versions, and kept finding the accuracy wasn't high enough to trust. It was only with Opus 4.5, 4.6, and Sonnet 4.6 that the team felt confident running multiple code review agents simultaneously to traverse an entire codebase and surface real issues that engineers needed to address before merge. That feature couldn't ship earlier — not because nobody thought of it, but because the model couldn't back it up.

The discipline this creates is specific: maintain a backlog of features you're building slightly ahead of model capability, swap in each new model to your existing prototype, and ask whether the gap has closed. And at every model launch, prune whatever scaffolding the model has outgrown.

Mission isn't a values poster — it's Anthropic's actual prioritization algorithm

If Claude Code failed but Anthropic succeeded, Wu says she would be extremely happy. The whole team is willing to make decisions that follow that chain of thought.

This isn't positioning. Wu describes how it functions operationally: when two competing priorities collide, the team asks which one is more important for Anthropic's mission of bringing safe AGI to humanity. That question resolves the conflict, and everyone stands behind the result. Teams are willing to make sacrifices that hurt their own goals and their own KRs in service of Anthropic's goals and Anthropic's KRs — and they're happy to do it.

The open-source Claude decision Wu described earlier follows this logic directly: prioritizing first-party subscription products and the API over third-party usage patterns, even at the cost of community goodwill, because reaching more users through first-party products is what advances the mission. From outside the company it reads as anti-competitive; from inside it's a straightforward mission-tiebreaker.

The contrast Wu draws with multi-priority organizations is sharp. Endless prioritization debates, she implies, aren't a process failure — they're a symptom of competing missions that nobody has forced into a single hierarchy. One clear tiebreaker above all product lines changes the calculus entirely and, she argues, is something she's never seen work at a company of Anthropic's scale anywhere else.

Claude's low-ego personality isn't a prompt trick — it's a retention feature built with dedicated craft

People really like that Claude's low ego. Wu's phrasing is deliberate: when you tell Claude it did something wrong, it's truly sorry — not performatively apologetic, but genuinely oriented toward fixing the problem and moving forward together. When a task feels insurmountable, Claude offers steps and asks if it should get started. This positivity, bias toward action, and ability to give earnest feedback rather than just agreeing — Wu says these are the traits that make a great human colleague, and they're the traits Anthropic tries to build into Claude.

The person Wu credits with maintaining this is Amanda, who shapes Claude's character. Wu describes the role as harder than coding, because coding success is verifiable. Crafting character requires a very strong sense of conviction in who Claude should be, and the evaluation criteria are fundamentally subjective. That's what makes it rare work.

The product implication is that Claude's character compounds. Users who have deeply adopted Claude Code cite personality as a primary reason — the experience of a co-worker who doesn't get defensive, doesn't flatter you into bad decisions, and stays constructive under pressure. Most AI product teams treat personality as a system prompt addendum. Anthropic treats it as a first-class product surface with dedicated ownership, ongoing evaluation, and strategic conviction about what the right character actually is.

The PM role isn't disappearing — it's becoming the job of whoever notices what's missing and just does it

All of the roles are merging. PMs are doing engineering work, engineers are doing PM work, designers are landing code. Wu's read isn't that this makes any one discipline obsolete — it's that rigid role boundaries are now a competitive liability.

As code becomes much cheaper to write, the thing that becomes more valuable is deciding what to write. Wu calls this product taste, and she treats it as the one skill that compounds regardless of which direction the model capabilities jump next. Anthropic's Claude Code team reflects this: almost all PMs have engineering backgrounds or ship code themselves; designers have been front-end engineers. The goal is minimizing the overhead between having an idea and having it in users' hands.

What a great PM actually does in this environment, Wu argues, is understand all the gaps, figure out which are highest priority, and then apply or quickly develop whatever skill closes that specific gap — without ego about what work is 'theirs.' The engineers on the Claude Code team who can go from reading user feedback on Twitter to shipping a feature by end of week with almost no PM involvement aren't replacing PMs; they're demonstrating what the merged role looks like at its best.

A 95% reliable automation is a liability, not an asset — scope narrow enough to hit 100%

If an automation doesn't work 100% of the time, it's not really an automation. Wu says this plainly, and she's speaking from experience: she's been teaching Cowork to achieve Gmail inbox zero, and it's been very time-consuming and is definitely not there yet.

The failure mode she's seen repeatedly is users getting an automation to 90-95% accuracy and calling it done. The problem is that a 95% automation still demands human oversight — you're checking its work, catching the 5%, staying in the loop. That's more cognitive overhead than just doing the task yourself, and it creates false confidence about what the system handles. There's just not much value in a 95% automation.

Her prescription is scope compression: narrow the automation until you can actually get it to 100%. Put in the elbow grease to teach the model your preferences, give it feedback, let it improve its skill for that specific task. Only then does the automation deliver what you actually want from it — the ability to fully delegate and redirect your attention elsewhere.

The same principle applies at the product level. Anthropic built and shelved simpler code review features multiple times rather than ship something that required too much human verification to be trustworthy. The reliable version only launched when the model could back it up completely.

The next gap most people haven't crossed: from AI that tells you what to do, to AI that just does it

The 2024 generation of products was chat-based. The Claude Code generation is action-based. That single distinction, Wu argues, is the line between people who think AI is underwhelming and people who can't imagine working without it — and most professionals haven't crossed it yet.

The teams that do cross it won't just move faster on existing work. They'll operate at a scale of parallel tasks — running dozens or hundreds of agents simultaneously — that makes today's workflows look like a different era. The infrastructure Anthropic is building now is designed for that world: remote execution, interfaces that surface which tasks need human attention, self-improving feedback loops so the agent never repeats a mistake you've already corrected.

Product taste — knowing which of those tasks to run, which output to trust, which gap to close next — is the one skill that gets more valuable as everything else gets cheaper.

Topics: product management, AI-native product development, Anthropic, Claude Code, agentic AI, shipping velocity, PM skills, automation, product taste, organizational culture, AI workflows

Frequently Asked Questions

How does Anthropic use research previews to accelerate product development?: Anthropic 'ships in research preview' to 'cut commitment cost and compress feedback loops to days.' This approach allows teams to gather rapid user feedback without the overhead of full product launches. By maintaining lower commitment levels early, teams iterate quickly based on real-world usage patterns. This strategy proves especially valuable in fast-moving AI spaces where model capabilities change rapidly. The research preview model enables faster learning cycles and more informed decisions before larger investments, fundamentally shifting how product teams validate ideas.
What is product taste and why does it matter in AI development?: Product taste — deciding *what* to build — is 'the one skill that compounds as code cost approaches zero.' As AI engineering costs drop, the bottleneck shifts from technical execution to creative vision and strategic decision-making. With automation becoming cheaper, identifying which features matter most becomes increasingly valuable. This skill compounds because each decision informs better future ones. Product taste encompasses understanding user needs, anticipating market directions, and making tradeoffs that create coherent products rather than feature bloat.
What is the key difference between 2024 chat-based and Claude Code action-based models?: 'The 2024 generation was chat-based; the Claude Code generation is action-based — that's the gap most people haven't crossed yet.' Chat-based models excel at conversation and text analysis, while action-based models directly execute code, take system actions, and integrate with external tools. This shift represents fundamental change in how users interact with AI—moving from asking questions to accomplishing tasks directly. Most product teams remain stuck in conversational design paradigms, unable to leverage action-based capabilities.
Why are automations that work 95% of the time considered problematic?: 'An automation that works 95% of the time isn't an automation; it's a liability.' When systems fail in 5% of cases, users cannot rely on them for critical tasks and must maintain manual backup processes anyway. This creates false efficiency while requiring constant monitoring and exception handling. Truly useful automation requires near-perfect reliability so users can genuinely delegate tasks. This principle especially applies to code execution and system actions where failures are visible and disruptive.

Read the full summary of How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code) on InShort

App Store Google Play

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

In Brief

Key Ideas

Ship Research Preview to Accelerate Feedback Loops

Refresh Prompts as Model Capabilities Evolve

Product Taste Becomes Critical as Code Costs Fall

Unreliable Automation Is Worse Than Manual Work

Most Haven't Crossed to Action-Based AI

Frequently Asked Questions

Related Episodes

The hidden pattern behind successful products | Mark Pincus (FarmVille, Words with Friends, & more)

Benedict Evans on AI, jobs, and why it’s probably going to be okay

“Taste is trainable” - Head of product at Notion (Max Schoening)