AI-assisted coding:
real help or phantom productivity boost?
Letting AI write your code sounds intriguing and cost-effective. How could a project not fall for it?

How to deal with your AI-assisted coding. Part 1
Boards see charts promising 55% productivity gains. Vendors promise context-aware coding agents that will “finally fix the talent gap.” Engineers report that Copilot “feels like magic.” And yet…your releases aren’t twice as fast, incidents haven’t dropped, and the roadmap still slips quarter after quarter.
That tension between how fast AI feels and how fast you actually ship — is the AI Productivity Paradox. We are currently in the "Uncanny Valley" of software engineering: the code is syntactically perfect and logically sound in isolation, yet distinctively "off" when placed in the broader context of a production system.
AI is not snake oil, nor is it a magic wand. While it is very intelligent, it has its ups and downs. Like human intelligence, AI is radically uneven. It is extremely efficient at turning a clear intention into code, exploring ideas and giving this special co-creation vibe, now called “vibe-coding”. However, AI is surprisingly clumsy at the things that actually drive value. Where does this clumsiness come from? One of the reasons is hidden in AI limitations. AI creates technical debt at record speeds because it lacks the "mental model" of the entire system. What do we mean by that?
1. The "Keyhole" Refactor (Context Blindness)
You may have noticed that AI is often limited by a context window – the amount of code or text it can “see” at once. Modern agents can ingest a lot, sometimes even an entire repository. The real risk, however, lies outside that boundary: downstream consumers in another repo or service, undocumented contracts, weak test coverage, and all the other out-of-context dependencies that never make it into the prompt.
So what is the outcome of this contextual inconsistency?
Everything looks fine locally. Tests are green, types line up, the linter is happy. But somewhere else, in a different repository or an older integration with no meaningful tests, a service still relies on if (user === null). Your refactor made one module “cleaner” while quietly breaking assumptions across the system. You improved a brick and cracked the wall.

2. The "Frankenstein" Architecture
AI does not have a sense of aesthetic consistency or architectural history. It solves the problem directly in front of it using whatever pattern is most common in its training data today, not the pattern your team painfully agreed on three years ago.
If your team’s decisions are not documented, the model will happily produce code that is technically correct yet culturally “wrong” for your repo. Humans do this too: drop a new developer into an undocumented codebase and they will default to the patterns they already know. The difference is that AI can do this at scale and at speed.

3. The "Happy Path" Hallucination
AI is incredibly optimistic. It writes code for a world where networks never fail, users never paste emojis into number fields, and databases are always awake and polite.

This is where discipline and specificity matter. Clean, concise requirements and explicit instructions are helpful for humans, but for your AI assistant they are non-negotiable. The model cannot guess. It is intelligent, but it has no gut feeling, no lived experience of “this will probably fail in production.”
To avoid code that only works in a fairy-tale environment, you need a senior human professional to do the unglamorous part: define what “correct” actually means, describe the unhappy paths, and insist that the implementation respects them. AI can move fast on the keyboard. The responsible human has to decide what “fast” is allowed to produce.
4. The "Library Re-Invention"
AI is trained on general code patterns, so unless you tell it what your repo already standardizes on, it will happily reinvent utilities you already have. But breathe out: this one is fixable. Document your preferred libraries and helpers, and enforce them in code review – exactly the way you’d guide a new engineer.

5. The “Security Gaslighting”
AI is helpful to a fault. If you ask it to do something risky, it usually won’t push back—it’ll just do it efficiently and with a smile. If you say, “I need to bypass this login for testing,” the model won’t warn you about the implications; it’ll just write the middleware to hard-code that bypass. It might even offer "helpful" suggestions like storing admin API keys in the client-side code or disabling validation "just for now."

The core issue is simple: AI has no concept of a threat model. It’s optimized for code that runs, not code that’s safe. You end up with functional snippets that quietly open the door to massive vulnerabilities—leaked secrets, broken authorization, or new SQL injection points. The model isn’t looking for integrity or compliance; it’s just looking for a successful execution.
This becomes dangerous in two very different ways, depending on who’s behind the keyboard:
For the "Vibe Coder" or non-pro:
AI feels like a brilliant shortcut that unblocks progress. If you don't have a deep background in security or OWASP standards, you’re likely to trust the model more than your own gut. A login bypass looks like a harmless "life hack" to get the job done, so the suggestion gets copied verbatim. In this scenario, AI is actually training the dev to adopt insecure habits as the default.
For the Senior Engineer:
The risk here is more subtle. You know perfectly well that hard-coding keys is a cardinal sin. But maybe it’s 4:30 PM on a Friday, you’re under pressure to fix a staging bug, and you’re "just trying something out." AI makes the "dirty shortcut" frictionless. What used to be a 15-minute manual hack (one that felt a little wrong to type out) is now a five-second copy-paste. The AI isn't inventing new ways to be reckless; it’s just lowering the barrier to doing the wrong thing. It wraps a bad idea in clean, confident code that’s one missed "cleanup" task away from ending up in production.
6. The Circular Bug Fix (The "Whack-a-Mole")
This is the most frustrating form of clumsiness.

Well, maybe you and your team have even more of these cases? However, while approaching AI-assisted coding it is better to keep in mind, that AI "clumsiness" comes from the fact that AI is a tactician, not a strategist.
The productivity paradox arises because humans spend less time typing syntax (the tactical part) but significantly more time reviewing, auditing, and integrating that syntax into the messy reality of the business (the strategic part).
For an executive audience, the question is no longer “Should my engineers use AI?” It’s “Where does AI actually reduce task completion latency in my stack? As well as “Where is it sneaky creating AI-generated technical debt while everyone feels so productive and comfortable?”
Now, let`s fall deeper into this rabbit hole and research some more “clumsy” AI cases.
Greenfield vs. Brownfield: Two Very Different AI Realities
In greenfield contexts, AI is everything the marketing decks claim.
Give a modern assistant a clean repo and a focused request:
- “spin up a FastAPI endpoint”,
- “generate Jest tests for this pure function”,
- “create a GitHub Actions pipeline similar to this one”.
What will you see? AI that behaves like an ultra-fast junior engineer. Studies show 20–60% gains on tightly scoped coding tasks: the assistant annihilates boilerplate and gets your team to a working first draft in minutes, not hours.
You feel that in the room: engineers are less drained by repetitive work. Onboarding velocity improves because a context-aware coding agent can walk a new hire through calling patterns, logging conventions, or error-handling idioms embedded in the codebase. The Return on AI (ROAI) in these scenarios is obvious and positive.
But that’s only one reality.
Most of your real spending is not on greenfield. It’s on brownfield work: altering behavior in a 10-year-old platform, threading new capabilities through legacy systems, respecting undocumented invariants that live in the heads of two architects and three ex-employees.
This is where the glossy story breaks.
A recent randomized controlled trial from METR dropped early-2025 AI tools (Cursor Pro and top-tier language models) into that exact environment: 16 experienced open-source developers, 246 issues, all on mature projects they knew intimately. Five years of prior contribution history, tens of thousands of stars per repo, to add to the picture.
Before the experiment, developers forecast a 24% speedup from AI. After the experiment, they still believed they’d been about 20% faster.
The data said something else:
When AI tools were allowed, issue completion time increased by 19%. Tasks took longer with AI than without it.
That gap that we see, between subjective acceleration and objective slowdown, is what could be called the Phantom Speedup. Less typing, more chatting, more “flow,” and a longer path to “ready for production.”
For senior leaders, that’s the first uncomfortable truth: AI can absolutely improve the experience of coding while degrading the economics of shipping.
The AI Productivity Paradox Starts in the Mind, Not the Model
Why does this happen?
The METR team dug through 140+ hours of screen recordings and found a central culprit: over-optimism. Developers believed AI help was, by definition, a net positive. So they kept reaching for it, even in situations where it obviously wasn’t saving time.
This is where vibe coding creeps in, a term coined by Andrej Karpathy, and widely used by Eric Bridgeford and colleagues in their “Ten Simple Rules for AI-Assisted Coding in Science.” You see code that looks right. It compiles. It follows familiar patterns from Stack Overflow and GitHub. It “vibes” as correct. But the developer has not truly interrogated the logic, edge cases, or methodological soundness.
In other words: AI is extremely good at generating correct-looking code. The pressure is then on the human to ensure correctness and completeness, not just syntactic validity. When that discipline softens, you don’t just get slower. You get slower and riskier.
Bridgeford’s work adds a second psychological warning: cognitive offloading vs. atrophy. AI is genuinely useful for lifting cognitive load: less time recalling API signatures, more time thinking about business logic. BUT. But if your organization leans too hard on the model to do problem decomposition, architecture decisions, or algorithm design, those muscles weaken. Over time, you get engineers who are excellent prompt operators and increasingly shaky systems thinkers.
That may feel fine in a sprint. It is catastrophic over a product’s lifetime.

















