> ./contents Contents 10
> ./posts/judgement-before-pmf

Why most "judgement is the differentiator" advice fails at pre-PMF (and the loop to steal)

Enterprise "judgement is the differentiator" advice breaks at pre-PMF, because the price of getting it wrong scales with runway. Here's the loop to steal.

id 0x04cluster Product Strategyread ~ 8 min
product-strategypre-pmfstartupjudgementai-products

Weiyao Fang wrote a careful summary of Leading the Product 2026 in Melbourne this week. Five threads through the speakers: less is more, judgement is the differentiator, deliberate sequencing, defensibility in an AI era, and the discipline of interpreting customer feedback rather than reacting to it. Sharp themes. Unambiguously enterprise framing. I commented on the judgement piece and kept thinking about the rest of it for a week afterwards.

The themes hold inside a large company. They invert at a pre-PMF startup, because the price of getting judgement wrong scales with how much runway you have to absorb the mistake.

The short answer#

Inside an enterprise, the five LP26 themes are operating discipline. Inside a startup before product-market fit, they are survival rules. The principle survives the move; the failure mode does not. A misread that costs an enterprise some margin will cost a seed-stage team the company.

The sentence I would put on a sticker: judgement is not the differentiator before PMF. It is the product.

What I actually do#

When I work on AI products, “messy idea to shipped system” is the bit I sell. The reframe changes what I push for in the first weeks of any new build.

Before any AI feature gets built, I want to see one customer conversation that proves the problem exists outside the founder’s head. Not five. One conversation in which the customer described the pain in their own words, unprompted. If that conversation has not happened, the feature is a guess in a costume.

When generating options for what to build next, I let AI generate five. I almost never ship one of the five. The five make the underlying frame visible. You can see what assumption every option is built on, and the right move is usually a sixth option that breaks the assumption all five share. AI is good at filling in a frame. Choosing the frame is the founder’s job.

For anything that gets into production, the question I ask is not “is the model good enough?” but “do we have the judgement loop wired up so that when it is wrong, we hear about it in days, not in months?”

The longer-term bet is that teams who treat judgement as a deliberately trained organisational muscle, not a personal trait of the founder, end up with better odds. Pre-PMF the muscle keeps you alive. Post-PMF it compounds into something that looks, from the outside, like taste.

1. Less = survival, not focus#

Cut the 80% of features that drive 20% of value. Danielle Harmer and Stephanie Musat framed this as deliberate enterprise discipline. At a pre-PMF startup the same move is not discipline. It is the difference between shipping a v2 and not.

CB Insights’ analysis of 431 VC-backed startups that shut down since 2023 attributed 43% of failures to poor product-market fit. The headline figure of 70% “ran out of cash” is the symptom; building too much of the wrong thing for too long is the disease.

The rule I now write down before any pre-PMF roadmap: any feature ships only if it delivers validated learning or buys a customer. Anything in between is a leak. A corporate team that maintains a long tail of weak features is inefficient. A startup team that does it is dead.

2. Judgement is the product, not the differentiator#

Mike Belsito’s point at LP26 was that AI helps you identify signals, synthesise information, and accelerate decision-making, but it should never automate judgement itself. True at any scale. Sharper before PMF, because most companies still treat judgement like something senior people accumulate through scars rather than something you deliberately train.

That assumption is wrong even inside large companies. The Good Judgment Project, Philip Tetlock’s IARPA-funded forecasting tournament, found that the top forecasters, drawn from the general public rather than intelligence services, beat trained intelligence analysts with access to classified information by roughly 30%. The trait that mattered was not credentials. It was how they thought: probabilistic rather than confident, willing to update on evidence, eclectic in source material, comfortable holding partial views. Tetlock’s earlier work on Expert Political Judgment found that pundits with strong unified worldviews were less accurate than generalists who held more partial, less confident views. If judgement were purely a function of scar tissue, the senior analysts would have won. They did not.

Which means a small team can deliberately train this muscle, and a startup with three months of runway has every reason to.

The AI productivity literature is the place this lands hardest. GitHub’s studies put developers using Copilot at 55% faster on certain coding tasks. The METR randomised trial of 16 experienced open-source developers, published in July 2025, found they were 19% slower on real production tasks when allowed to use AI tools, and estimated themselves to have been 20% faster. Both results can be true. The first measures generation. The second measures judgement and verification under real conditions.

That gap is the entire startup situation in one experiment. AI shortens the time to generate any plausible option. It does almost nothing for the time to pick the right one, and it adds time to the verification step. At pre-PMF, where you have no data to fall back on, the cost of the wrong pick is much higher. The skill becomes knowing which option deserves to survive the room and having enough conviction to say no to the rest.

This skill is trainable. The Tetlock interventions that worked were small and repeatable: write down your predictions, attach probabilities, score yourself honestly, update when the evidence shifts. Teams that run a Friday “what did we predict, what actually happened” loop on the previous week’s product decisions are doing the same thing on a startup timescale.

3. Sequencing wrong costs runway, not margin#

Several LP26 speakers, Danielle on the NYT paywall in particular, talked about deliberate sequencing: progression over perfection. The NYT analogy works because the NYT had years. A seed-stage team has months.

Eric Ries built The Lean Startup around exactly this constraint. The build-measure-learn loop’s whole point is speed, because every cycle costs runway. The practical bar: if you cannot ship something testable inside four weeks, you are probably building too much.

The corporate version of the rule is “release more often.” The startup version is “release more often, or run out of money before you learn anything.”

4. Defensibility is the open question I haven’t worked out yet#

Julie Brettle raised the sharpest question of the conference: if AI can replicate a product from scratch relatively quickly, what is actually defensible?

Hamilton Helmer’s 7 Powers framework predates the current AI wave but has been re-examined for it. Virta Ventures’ applied analysis points out that AI lowers traditional scale-economy advantages, erodes generic data moats, and flattens technical switching costs. What remains is some combination of rights-cleared continuously refreshed data, deep workflow embedding, organisational switching costs, and process power that takes years to compound.

A pre-PMF startup has none of those yet. What it has is speed of iteration, closeness to the customer, and a willingness to make calls that bigger companies would never sign off on. None of those are durable moats on their own. They are the inputs you use to build a moat that compounds before you stop being small. The honest answer to Julie’s question is that defensibility for an AI-era startup is not a static feature of the product. It is the rate at which you convert ephemeral edges into durable ones before the next round.

This is the question I am the least confident about. I will write more on it once I have a sharper take.

5. The 1% problem inverts#

Danielle warned against over-indexing on the loudest 1% of customers. Inside a company with thousands of users that warning is right. Inside a pre-PMF team with eight users it runs the wrong way: every voice is loud and you cannot afford to dismiss any of them. You also cannot follow all of them.

The skill that actually matters is the one Rob Fitzpatrick wrote down in The Mom Test: customers describe their preferred solution when you ask for feedback, not their actual problem. The three classes of bad data Fitzpatrick catalogues, compliments, hypothetical fluff, and feature wishlists, are exactly what your loud 1% will give you when you ask “what should we build next?”. The skill is taking five customer interviews and pulling out the one piece of behaviour that is doing real work, not the five suggestions that sounded good in the moment.

AI genuinely earns its place at this stage. Synthesising five interview transcripts into a single problem statement, with the speaker’s exact language attached, is faster than doing it in your head and is often more honest. The signal-extraction step is automatable in a way the conviction step is not.

Caveats#

  • CB Insights’ 43% figure is their own analysis of VC-backed shutdowns since 2023: industry research, not peer-reviewed, and bootstrapped failure patterns may differ.
  • METR’s 19% slowdown is one rigorous data point (16 experienced developers, real tasks) against a literature of vendor-funded speedup studies. Both can be true if the speedup is narrow generation work and the slowdown is verification and integration in unfamiliar code. The honest read: AI’s effect on real work is mixed and context-dependent.
  • Tetlock’s superforecaster findings come from geopolitical tournaments. The structural point (thinking style beats credentials, deliberate training improves accuracy) has replicated; the specific 30% gain does not transfer to “founder picking which feature to build.”
  • Defensibility is the section I have the lowest confidence in. 7 Powers and its AI-era extensions are thoughtful strategic commentary, not empirical work. Anyone claiming to know what is defensible for an AI startup in 2026 should be treated with caution, including me.

References#

Primary essays and frameworks:

Empirical work:

Industry research:

Strategic commentary:

Original inspiration: