The AI Hallucination Exploit That's Poisoning 440,000+ Packages

Q: What is the AI package hallucination rate?

Research testing 576,000 code samples found 19.7% of AI-recommended packages don't exist. Open-source models hallucinate at 21.7%, commercial models at 5.2%. GPT-4 Turbo shows 3.59% while CodeLlama exceeds 33%.

Q: How many fake packages did researchers find?

Researchers discovered 205,474 unique hallucinated package names across 440,445 total instances. 43% of these hallucinations appeared every single time when prompted, and 58% repeated more than once across 10 test runs.

Q: Which AI models hallucinate the most packages?

CodeLlama 7B and 34B hallucinated in over one-third of outputs (>33%). GPT-4 Turbo performed best among tested models at 3.59%. Overall, open-source models averaged 21.7% hallucination rate versus 5.2% for commercial models.

576,000 code samples. 16 different AI models. One disturbing finding: nearly 1 in 5 packages that AI coding tools recommend don't actually exist. And attackers have figured out how to weaponize this.

Security researchers have discovered what might be the most scalable supply chain attack vector ever found. It's called "slopsquatting"—and it exploits something AI does constantly: confidently hallucinate package names that sound real but aren't.

Here's the problem. When your AI assistant suggests pip install data-validator-pro or npm install react-utils-helper, there's a 20% chance those packages don't exist. But nothing stops an attacker from registering them tomorrow—with malware inside.

The Research Findings

576,000 code samples tested across Python and JavaScript ecosystems
19.7% hallucination rate—nearly 1 in 5 AI package recommendations don't exist
205,474 unique fake packages discovered across all tests
43% repeat every time—these hallucinations are predictable and exploitable
Open-source models: 21.7% vs commercial models: 5.2% hallucination rate
CodeLlama worst performer at >33%, GPT-4 Turbo best at 3.59%
No attacks confirmed yet—but the vulnerability is real and documented

The Research That Should Terrify You

In April 2025, Seth Larson—Security Developer-in-Residence at the Python Software Foundation—coined a term for something security researchers had been quietly worrying about: "slopsquatting."

The name combines "slop" (the increasingly common term for AI-generated content) with "squatting" (the practice of registering names hoping someone will accidentally use them). But unlike typosquatting—where attackers bet on human typing errors—slopsquatting bets on AI making the same mistakes over and over.

Researchers at Socket Security decided to test just how exploitable this was. Their methodology was straightforward:

Prompt 16 different AI models to generate code for common programming tasks
Extract all package references from the generated code
Check if those packages actually exist in PyPI and npm
Run each test 10 times to measure consistency

The results were worse than expected.

Key Finding

Across 576,000 code samples, 19.7% of all package recommendations pointed to packages that don't exist. That's not a rounding error. That's 1 in 5 suggestions being completely fabricated.

But here's what makes this exploitable: these hallucinations aren't random. The same fake package names appear again and again:

43% of hallucinated packages appeared in every single test run
58% repeated more than once across 10 runs
Researchers found 205,474 unique fake package names

This predictability is the vulnerability. If an attacker can figure out which packages AI models commonly hallucinate, they can register those names on PyPI or npm before developers try to install them.

How a Slopsquatting Attack Works

Let's walk through exactly how this attack unfolds:

Step 1: AI Generates Plausible-Looking Code

Developer asks Copilot/Cursor/Claude to add data validation. AI confidently responds with code importing a package that doesn't exist.

Step 2: Developer Trusts the Suggestion

The package name sounds legitimate. The code looks correct. Developer runs the install command without checking if the package is real.

Step 3: Attacker's Package Gets Installed

Because an attacker already registered that package name with malicious code inside, the developer just installed malware into their project.

Here's what this looks like in practice:

# Developer prompt: "Add input validation to my Flask API"
# AI response:

from flask import Flask, request
from flask_input_validator import validate  # This package doesn't exist!

app = Flask(__name__)

@app.route('/api/user', methods=['POST'])
def create_user():
    data = validate(request.json, UserSchema)  # Looks legitimate...
    return save_user(data)

# Developer runs: pip install flask_input_validator
# If an attacker registered this name... game over.

The brilliance (from an attacker's perspective) is that this scales infinitely. Unlike typosquatting where you're betting on individual human mistakes, slopsquatting exploits systematic AI behavior. One malicious package registration could potentially compromise thousands of developers who all receive the same hallucinated suggestion.

Which AI Models Hallucinate the Most?

Not all models are equally bad at this. The research tested 16 different models and found massive variation:

Model	Hallucination Rate	Risk Level
CodeLlama 7B	>33%	High
CodeLlama 34B	>33%	High
Open-source average	21.7%	High
Commercial average	5.2%	Medium
GPT-4 Turbo	3.59%	Lower

What This Means

If you're using open-source coding models (increasingly popular for privacy and cost reasons), you face roughly 4x the hallucination risk compared to commercial models. CodeLlama—widely used and respected—hallucinates packages in more than one-third of outputs. Even GPT-4 Turbo, the best performer, still invents fake packages nearly 4% of the time.

The problem isn't that these models are broken. They're doing exactly what they're designed to do: predict plausible next tokens based on training data. Sometimes that training data included references to packages that never existed, or packages that existed briefly and were removed, or naming patterns that suggest packages that "should" exist.

The Vibe Coding Problem

This vulnerability didn't emerge in a vacuum. It's deeply connected to how developers actually use AI tools—what Andrej Karpathy (former Tesla AI director, OpenAI founding member) famously called "vibe coding."

Vibe coding is when you tell an AI what you want, accept the code it generates, and move on without deeply understanding what it wrote. For experienced developers, it's a productivity boost. For everyone, it's a trust exercise.

The Trust Problem

Vibe coding works because AI-generated code usually works. But "usually" is the problem. When you're moving fast, trusting suggestions without verification, a 20% hallucination rate means you're rolling the dice constantly. Every pip install or npm install of an AI-suggested package is a potential attack surface.

The research found that developers using AI assistants are optimizing for speed—that's the whole point. Stopping to verify every package recommendation defeats the purpose. This creates pressure to trust AI suggestions blindly, especially when the generated code looks plausible and the package name seems reasonable.

Attackers count on this. The faster you move, the less verification you do.

Is This Risk Actually Real?

Here's where we need to be honest about what we know and don't know.

What's Confirmed

The vulnerability exists. AI models definitely hallucinate package names. The 19.7% rate is measured, reproducible, and significant. The predictability (43% appearing every time) makes this systematically exploitable.

What's Not Confirmed

No real-world slopsquatting attacks have been publicly documented. Researchers discovered the vulnerability. Actual malicious exploitation hasn't been confirmed in public reports. This could mean: (1) attacks haven't happened yet, (2) attacks happened but weren't detected, or (3) attacks happened but weren't reported. We genuinely don't know.

Some security researchers argue the lack of confirmed attacks means we're overhyping the risk. Others argue that supply chain attacks are notoriously hard to detect and attribute—by the time you notice, the damage is done and the trail is cold.

Our view: the vulnerability is real, documented, and exploitable. Whether attackers are actively exploiting it today is unknown. But waiting for confirmed attacks before taking precautions is like waiting for the fire to start before installing smoke detectors.

How to Protect Yourself

The good news: protecting against slopsquatting doesn't require abandoning AI coding tools. It requires adding verification steps that you should probably have anyway.

Immediate Actions

Verify before installing. When AI suggests a package you don't recognize, check that it exists on PyPI/npm before running the install command. Takes 10 seconds.
Check publish dates. If a package was created recently and has low download counts, be suspicious. Slopsquatted packages are necessarily new.
Use lockfiles religiously. Package lockfiles (package-lock.json, Pipfile.lock) prevent silent dependency changes. Commit them. Review changes to them.
Enable dependency scanning. Tools like Snyk, Dependabot, and Socket can flag suspicious packages before they enter your codebase.

For Teams

Implement package approval workflows. New dependencies should require review, especially when added via AI assistance.
Audit recent additions. Review packages added since your team adopted AI coding tools. Look for suspicious or unnecessary dependencies.
Train developers. Make sure your team understands slopsquatting risks and knows how to verify packages.

AI Settings That Help

Lower temperature settings reduce randomness and may reduce hallucinations (though this isn't well-studied for packages specifically)
Prefer commercial models if security is critical—they hallucinate packages at roughly 1/4 the rate of open-source models
Use tools with real-time verification—some AI coding assistants now check package existence before suggesting

The Bottom Line

AI coding tools have a 20% chance of recommending packages that don't exist. Those hallucinations are predictable enough that attackers could pre-register malicious versions. No confirmed attacks have happened yet, but the vulnerability is real and documented.

This doesn't mean AI coding tools are unsafe to use. It means they require the same skepticism you'd apply to any external input: verify before trusting. The 10 seconds it takes to check if a package exists is cheap insurance against a supply chain compromise.

The era of vibe coding without verification is probably over—or at least, it should be.

Sources & Methodology Notes

Socket Security Research (2025): "Slopsquatting" paper analyzing 576,000 code samples across 16 LLMs. Methodology: prompted models to generate code, extracted package references, verified against PyPI/npm registries.
Seth Larson coining: Python Software Foundation Security Developer-in-Residence, April 2025.
Model-specific rates: From Socket Security paper. GPT-4 Turbo (3.59%), open-source average (21.7%), commercial average (5.2%), CodeLlama variants (>33%).
"Vibe coding" term: Coined by Andrej Karpathy, describing AI-assisted development where developers accept code without deep understanding.
Attack confirmation status: As of December 2025, no public reports confirm active slopsquatting exploitation. This could reflect lack of attacks, lack of detection, or lack of reporting.

Frequently Asked Questions

What is the AI package hallucination rate?

Research testing 576,000 code samples found 19.7% of AI-recommended packages don't exist. Open-source models hallucinate at 21.7%, commercial models at 5.2%. GPT-4 Turbo shows 3.59% while CodeLlama exceeds 33%.

What is slopsquatting?

Slopsquatting is a supply chain attack where attackers register malicious packages using names AI coding tools commonly hallucinate. The term was coined by Seth Larson of the Python Software Foundation in April 2025, combining "slop" (AI-generated content) with "squatting."

How many fake packages did researchers find?

Researchers discovered 205,474 unique hallucinated package names across 440,445 total instances. 43% of these hallucinations appeared every single time when prompted, and 58% repeated more than once across 10 test runs.

Which AI models hallucinate the most packages?

CodeLlama 7B and 34B hallucinated in over one-third of outputs (>33%). GPT-4 Turbo performed best among tested models at 3.59%. Overall, open-source models averaged 21.7% hallucination rate versus 5.2% for commercial models.

Have there been real slopsquatting attacks?

No real-world slopsquatting attacks have been publicly documented yet. The research demonstrates the vulnerability exists and is exploitable, but confirmed malicious exploitation hasn't been reported. This doesn't mean attacks aren't happening—they may simply go undetected or unreported.