The METR Study on AI Coding: What "19% Slower" Actually Means

A rigorous study found experienced developers were 19% slower with AI tools. Headlines treated this as proof AI coding is broken. But the research is more nuanced—and more interesting—than either enthusiasts or skeptics acknowledge.

You've probably seen the statistic everywhere: "AI makes developers 19% slower." It's been used to dismiss AI coding tools entirely, to justify skepticism, and to fuel takes in both directions.

The research is real and worth understanding. But context matters. Let's look at what the studies actually found, what they didn't measure, and what this means for how you think about AI coding tools.

The Key Numbers (With Context)

19%
Slower in METR study (16 experienced devs, specific conditions)
16
Developers in study (small but rigorous sample)
±35%
Confidence interval (wide uncertainty range)
69%
Still preferred AI (despite slower results)

What the METR Study Actually Found

In July 2025, METR (Model Evaluation and Threat Research) published a randomized controlled trial—one of the most rigorous studies on AI coding productivity to date.

The setup: 16 experienced open-source developers provided 246 real issues from their own repositories. These weren't beginners—they were maintainers of mature projects averaging over a million lines of code. Half of their tasks were randomly assigned to use AI tools (primarily Cursor Pro with Claude 3.5/3.7 Sonnet), half without.

The headline finding: developers using AI took 19% longer to complete tasks.

Critical Context Often Missing

Sample size: 16 developers is small. The researchers acknowledge this. It's enough to suggest an effect but not to be confident about magnitude.

Confidence interval: The 95% CI ranged from -26% to +9%. The true effect could be anywhere from "AI makes you 26% slower" to "AI makes you 9% faster." The study suggests slowdown is likely but can't pin down how much.

Specific population: Experienced developers on familiar codebases—exactly where human expertise has the biggest advantage. This isn't "all developers" or "all contexts."

What wasn't measured: Beginners, unfamiliar codebases, learning scenarios, prototyping—contexts where AI might perform differently.

The Perception Gap: Genuinely Interesting

Perhaps more interesting than the speed result: developers predicted AI would make them 24% faster. After completing tasks—with measurably slower results—they still believed they'd been 20% faster.

That's a 39-point gap between perception and reality.

And yet: 69% of developers kept using AI tools after the study ended. They preferred the experience even knowing they were slower.

What This Might Mean

The perception gap suggests developers can't accurately self-assess AI's productivity impact. This has implications for every survey asking developers if AI helps them.

But the continued preference raises questions: Are there benefits beyond raw speed? Reduced cognitive load? Different engagement quality? The study measured completion time but not experience quality.

Both observations can be true: AI might make experienced developers slower while also providing something they value.

The GitClear Data: Code Quality Concerns

Separately, GitClear analyzed 211 million lines of changed code across 2020-2024 and reported concerning trends:

These findings suggest code quality degradation correlating with AI adoption. But there are important caveats:

Context for GitClear Data

Correlation vs. causation: The trends correlate with AI adoption timing, but other factors could contribute (team composition changes, project maturity, pressure to ship faster).

Industry-wide vs. AI-specific: Not all repositories in the analysis use AI tools. The trends might reflect industry-wide changes.

What counts as "quality": More duplication isn't always worse—sometimes duplicating code is the right choice over premature abstraction.

The Harari Perspective: Why This Might Make Sense

Yuval Noah Harari argues AI represents something genuinely new: systems that make autonomous decisions rather than just following instructions. This offers an interesting lens on the productivity findings.

AI as Alien Decision-Maker

When you use an AI coding assistant, you're collaborating with a system that has its own "opinions" about how code should be written. It makes choices you didn't ask for.

For experienced developers with strong mental models of their codebase, this creates friction. Time goes to evaluating AI suggestions against your own understanding, rejecting what doesn't fit, and correcting what's almost-but-not-quite-right.

For developers without strong existing models—beginners, or experts in unfamiliar territory—the AI's "opinions" might fill gaps rather than create conflicts.

This isn't about AI being "wrong." It's about what happens when two different decision-making systems try to collaborate on the same task.

Where AI Probably Helps

The METR study measured one specific context. Other evidence—less rigorous but worth considering—suggests AI might help in different situations:

Where AI Probably Hurts

What We Know and Don't Know

Finding Evidence Level Important Caveats
Experienced devs slower (familiar code) One rigorous RCT Small sample; wide confidence interval; specific conditions
Perception gap is real Strong Consistent finding; implications for all self-reported data
Code quality degradation Correlational GitClear data shows trends but can't prove causation
AI helps beginners/unfamiliar code Suggestive Less rigorous evidence; plausible mechanism
Security vulnerabilities increase Multiple studies Consistent finding; missing human baseline comparison
Developers prefer AI despite slowdown Strong 69% continued use; reasons unclear

Transparency Note

Syntax.ai builds AI coding tools. The original version of this article used the METR study to set up a sales pitch for our products, claiming we'd "solved" the problems the research identified with specific metrics we couldn't verify. That wasn't honest. We've rewritten this to present the research more accurately. We don't know if our approach produces better results—and we shouldn't claim we do without rigorous evidence comparable to what we're citing here.

What This Means For You

If You're an Experienced Developer

If You're Making Team Decisions

The Bottom Line

The "19% slower" finding is real research worth taking seriously. It's also one study with 16 participants, a wide confidence interval, and specific conditions that may not generalize.

What seems reasonably supported: AI coding tools don't deliver the transformative productivity gains often claimed. Experienced developers on familiar codebases may not benefit—and might be slower. The perception gap is real and makes self-reported data unreliable. Code quality concerns exist but aren't definitively attributed.

What remains uncertain: Whether AI helps in different contexts (beginners, unfamiliar code, prototyping). Why developers prefer AI despite measured slowdowns. What the right use cases actually are.

The Question Worth Asking

Instead of "Does AI make coding faster?" try "In what specific contexts might AI help me, and how would I objectively verify that?"

That's harder than adopting a tool because it feels faster. It's also more likely to lead to genuine understanding of when AI helps and when it doesn't.

Sources & Methodology Notes

  • METR Study (2025): "Measuring the Impact of Early LLMs on Coding" - randomized controlled trial with N=16 experienced developers, 246 real issues. Published methodology available. Key limitation: small sample size with wide confidence intervals (95% CI: -26% to +9%).
  • GitClear Analysis (2024): Analysis of 211 million lines of changed code across 2020-2024. Reports correlation between AI adoption timing and code quality metrics. Methodology: automated code analysis. Limitation: observational data cannot establish causation.
  • 69% continued AI use: From METR study follow-up survey; self-reported preference.
  • Harari's "Alien Intelligence" framework: From various lectures and writings (2023-2024); our application to coding productivity is our interpretation.

We've tried to present these findings with appropriate uncertainty. The METR study is the most rigorous evidence available but has limitations. Other claims about AI coding productivity (both positive and negative) typically rely on weaker evidence than what we've cited here.