You've seen the headlines: "AI makes developers 19% slower." "45% of AI code has security flaws." "Pricing skyrockets." But what do these findings actually mean? And what context is usually missing from the coverage?
There's legitimate research worth understanding here. There are also real limitations in that research. And there's a lot of sensationalism making it hard to think clearly about any of it.
Let's look at the three main areas of concern—productivity, security, and cost—and try to separate what we know from what we don't.
Research Summary
- Productivity: METR study found 19% slowdown for experienced devs on familiar code—but 69% kept using AI anyway.
- Security: 45% of AI code has vulnerabilities (Veracode)—but no human baseline comparison exists.
- Cost: Usage-based pricing replacing flat rates. Some power users face 5-10x cost increases.
- Perception gap: Developers believe AI helps 20%+ when measured results show opposite.
- Missing data: Beginner productivity, unfamiliar codebases, human security baselines—all understudied.
- The honest take: Real effects, more modest than claimed. Context-dependent. Measure objectively.
The Numbers You've Seen (With Context)
The Productivity Question
The METR study is the most rigorous research we have on AI coding productivity. For a detailed breakdown, see our full METR study analysis. It's worth understanding what it actually found—and what it didn't.
What the Study Measured
16 experienced open-source developers worked on 246 real issues from their own repositories. Half used AI tools (Cursor Pro with Claude 3.5/3.7). Half didn't. The developers using AI took 19% longer.
Important Context Often Missing
Sample size: 16 developers is small. The researchers acknowledge this limitation.
Confidence interval: The 95% CI ranged from -26% to +9%. The true effect could be anywhere in that range.
Specific conditions: These were experts working on codebases they already knew well. The study explicitly notes this is where human expertise advantages are largest.
What wasn't measured: Beginners, unfamiliar codebases, prototyping, learning scenarios—contexts where AI might perform differently.
The Perception Gap Is Real
Here's the genuinely interesting finding: developers predicted AI would make them 24% faster. After finishing—with measurably slower results—they still believed they'd been 20% faster. And 69% kept using AI tools after the study ended.
This suggests two things that might both be true:
- Developers can't accurately self-assess AI's productivity impact
- There might be benefits beyond raw speed (reduced cognitive load, different kind of engagement, learning)
The study measured completion time. It didn't measure enjoyment, stress, or other factors that might explain why developers preferred the experience despite being slower.
The Harari Perspective
Yuval Noah Harari argues AI represents the first technology that makes autonomous decisions rather than just following instructions. This reframes the productivity question interestingly.
When you use an AI coding assistant, you're collaborating with a system that has its own "opinions." For experts who already know exactly what they want, this creates friction—time spent evaluating and correcting AI suggestions rather than just coding.
For developers in unfamiliar territory, the AI's "opinions" might fill knowledge gaps. The same property that slows experts might help beginners.
What We Can and Can't Conclude About Productivity
Reasonably supported: Experienced developers working on familiar, complex codebases may not benefit from current AI tools—and might be slower.
Not well-studied: Whether AI helps beginners, whether it helps with unfamiliar code, whether it helps with specific task types (prototyping, boilerplate, learning).
Overstated in both directions: "AI makes all developers slower" (not measured). "AI makes developers 10x faster" (not supported).
The Security Question
Security research is more consistently concerning than productivity research. Multiple studies find AI-generated code has significant security issues.
What the Studies Found
Veracode tested 100+ LLMs across 80 coding tasks and found 45% of AI-generated code contained security vulnerabilities. Other studies report even higher rates for specific vulnerability types (XSS, injection attacks). For a deeper analysis, see our article on the AI code quality crisis.
The Real Concern
AI optimizes for "code that looks correct" and "code that compiles." Security requires thinking about edge cases, attack vectors, and defensive patterns that may not be well-represented in training data.
This isn't a "wait for better models" problem. Research suggests larger models don't perform significantly better on security tasks. The issue appears to be architectural, not just scale-related.
The Missing Baseline
Important caveat: Most of these studies don't compare AI-generated code to human-written code under similar conditions. "45% of AI code has security flaws" sounds alarming, but what's the rate for human-written code?
We don't have great data on this. Some studies suggest human code also has high vulnerability rates. The honest answer is: we don't know if AI is worse than humans, only that it's not as good as we'd hope.
This doesn't mean security concerns aren't valid—they are. But the framing of "AI is uniquely dangerous" may be overstated without baseline comparisons.
What's Reasonable to Conclude About Security
Well-supported: AI-generated code frequently contains security vulnerabilities. You should not trust AI code to be secure without review. Security review processes need to account for AI-generated code.
Less clear: Whether AI code is significantly worse than human code. Whether specific vulnerability types are more common in AI code. Whether security improves with better prompting or review processes.
The Cost Question
AI coding tool pricing has changed significantly in 2025. Several platforms moved from flat-rate to usage-based pricing, causing unpredictable bills for some users.
What Actually Happened
Cursor moved to a credit-based system. Claude Code introduced usage caps. Replit changed to "effort-based pricing." These changes surprised users who'd budgeted for flat-rate subscriptions.
Why Pricing Changed
AI coding platforms were subsidizing heavy users. Some power users were getting $1,000+ of API value for $200/month. That's not sustainable as a business model.
The shift to usage-based pricing reflects actual costs. GPU compute is expensive. Model inference isn't free. The "era of cheap unlimited AI" was always going to end.
The legitimate complaint: Changes were often sudden, poorly communicated, and broke budget expectations. The economics are understandable; the execution frustrated users.
What This Means Practically
If you're budget-constrained: Flat-rate options like GitHub Copilot ($10-19/month) offer more predictability than usage-based tools.
If you're a heavy user: Usage-based pricing might actually cost you more than before. Do the math before committing.
For teams: Enterprise costs can be significant. A 500-developer team on GitHub Copilot Business faces ~$114,000/year. Whether that's worth it depends on actual productivity gains—which, per the research above, are uncertain. For more on enterprise costs, see our hidden costs analysis.
The Honest Assessment
| Topic | What We Know | What We Don't Know |
|---|---|---|
| Productivity (experienced devs) | One rigorous study found 19% slowdown | Whether this generalizes; why devs prefer it anyway |
| Productivity (beginners) | Limited rigorous research | Whether AI helps those learning or in unfamiliar code |
| Security | AI code frequently has vulnerabilities | How it compares to human baseline |
| Cost | Pricing is shifting to usage-based models | Where pricing stabilizes long-term |
| Perception gap | Developers overestimate AI's help | What non-speed benefits might explain preference |
Transparency Note
Syntax.ai builds AI coding tools. We're not neutral observers of this research—we have commercial interests in the AI coding space. We've tried to present the research honestly, including findings that might reflect poorly on AI tools generally. But you should factor our perspective into how you evaluate this analysis.
What Makes Sense Given the Uncertainty
Given what we know and don't know, here's what seems reasonable:
For Individual Developers
- Don't trust your perception: The research is clear that developers can't accurately self-assess AI's productivity impact. If you want to know if AI helps you, measure objectively.
- Context matters: AI probably works differently for different tasks. Prototyping, learning, boilerplate might benefit more than complex work on familiar codebases.
- Security requires verification: Don't trust AI code to be secure. Run static analysis. Review carefully. This isn't optional.
For Teams and Organizations
- Question bold productivity claims: The "10x developer" narrative isn't supported by rigorous research. Self-reported surveys aren't reliable evidence.
- Measure actual outcomes: If you're deploying AI tools, track objective metrics—completion times, defect rates, security findings—not just adoption or satisfaction.
- Budget for uncertainty: Both productivity gains and cost savings are less certain than vendors claim. Plan conservatively.
For the Discourse Generally
- Be skeptical of extremes: "AI is useless" and "AI is transformative" are both overstated. Reality is context-dependent and uncertain.
- Demand baseline comparisons: "AI code has X% vulnerabilities" means little without human code baselines.
- Acknowledge what we don't know: The research is early. Sample sizes are small. Conditions are specific. Humility is appropriate.
The Bottom Line
AI coding tools are real technology with real effects. Those effects are more modest and context-dependent than either enthusiasts or critics suggest.
The productivity research suggests experienced developers on familiar codebases may not benefit—and might be slower. The security research suggests AI code needs careful review. The cost landscape is shifting toward usage-based pricing that may surprise users.
None of this means AI coding tools are useless. Some developers in some contexts probably benefit. But the evidence for transformative productivity gains isn't there yet, and the security concerns are real.
The Question Worth Asking
Instead of "Are AI coding tools good or bad?" try "In what specific contexts might AI help me, and how would I verify that it actually does?"
That's a harder question. It requires actual measurement rather than vibes. And honestly, the answer might be "I don't know yet"—which is a perfectly reasonable place to be given the current state of evidence.
Sources & Evidence Notes
- METR Study: "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" - Randomized controlled trial with 16 developers, 246 tasks. Published 2025.
- Veracode Security Research: Tested 100+ LLMs across 80 coding tasks. The 45% vulnerability rate is real, but no equivalent human baseline study was conducted.
- Pricing Information: Based on publicly announced pricing changes from Cursor, Anthropic (Claude Code), and Replit in 2025. Enterprise cost calculation assumes GitHub Copilot Business at $19/user/month × 500 developers × 12 months.
- Perception Gap Finding: From the METR study—developers predicted 24% faster, believed 20% faster after, but measured 19% slower.
Note: Research in this space is early. Sample sizes are small. We've tried to represent findings accurately while acknowledging their limitations.
Frequently Asked Questions
Do AI coding tools actually make developers slower?
The METR study found 16 experienced developers were 19% slower when using AI tools on codebases they knew well. However, the confidence interval was wide (-26% to +9%), meaning results could range from significant slowdown to modest speedup. Notably, 69% kept using AI anyway. The study specifically measured experts on familiar code—contexts where human expertise advantage is largest. Results for beginners or unfamiliar codebases weren't measured.
How secure is AI-generated code?
Veracode tested 100+ LLMs and found 45% of AI-generated code contained security vulnerabilities. However, most studies don't compare this to human-written code baselines. AI optimizes for "code that compiles" rather than "code that is secure." Security review processes must account for AI-generated code—don't trust it to be secure without verification. The missing baseline is important: we don't know if AI is worse than humans, only that it's not as good as we'd hope.
Why are AI coding tools getting more expensive?
AI coding platforms were subsidizing heavy users—some getting $1,000+ of API value for $200/month subscriptions. In 2025, Cursor, Claude Code, and Replit shifted to usage-based pricing reflecting actual compute costs. GPU compute is expensive; the "era of cheap unlimited AI" was unsustainable. For budget predictability, flat-rate options like GitHub Copilot ($10-19/month) may be better than usage-based tools. For teams, enterprise costs add up: a 500-developer team on GitHub Copilot Business faces ~$114,000/year.
Should developers use AI coding tools?
It depends on context. AI probably works differently for different tasks—prototyping, learning, and boilerplate may benefit more than complex work on familiar codebases. Key recommendations: Don't trust your perception (developers overestimate AI's help). Measure objectively. Never trust AI code to be secure without review. The evidence for transformative productivity gains isn't there yet, but some developers in some contexts probably benefit.