contact us

AI-assisted coding:
real help or phantom productivity boost?

Letting AI write your code sounds intriguing and cost-effective. How could a project not fall for it?

AI-assisted coding

How to deal with your AI-assisted coding. Part 1

Softwarium

Boards see charts promising 55% productivity gains. Vendors promise context-aware coding agents that will “finally fix the talent gap.” Engineers report that Copilot “feels like magic.” And yet…your releases aren’t twice as fast, incidents haven’t dropped, and the roadmap still slips quarter after quarter.

That tension between how fast AI feels and how fast you actually ship — is the AI Productivity Paradox. We are currently in the "Uncanny Valley" of software engineering: the code is syntactically perfect and logically sound in isolation, yet distinctively "off" when placed in the broader context of a production system. 

AI is not snake oil, nor is it a magic wand. While it is very intelligent, it has its ups and downs. Like human intelligence, AI is radically uneven. It is extremely efficient at turning a clear intention into code, exploring ideas and giving this special co-creation vibe, now called “vibe-coding”. However, AI is surprisingly clumsy at the things that actually drive value. Where does this clumsiness come from? One of the reasons is hidden in AI limitations. AI creates technical debt at record speeds because it lacks the "mental model" of the entire system. What do we mean by that? 

1. The "Keyhole" Refactor (Context Blindness)

You may have noticed that AI is often limited by a context window – the amount of code or text it can “see” at once. Modern agents can ingest a lot, sometimes even an entire repository. The real risk, however, lies outside that boundary: downstream consumers in another repo or service, undocumented contracts, weak test coverage, and all the other out-of-context dependencies that never make it into the prompt.

So what is the outcome of this contextual inconsistency?

Everything looks fine locally. Tests are green, types line up, the linter is happy. But somewhere else, in a different repository or an older integration with no meaningful tests, a service still relies on if (user === null). Your refactor made one module “cleaner” while quietly breaking assumptions across the system. You improved a brick and cracked the wall.

The "Keyhole" Refactor
The Scenario

The Scenario

You ask the AI to “clean up” a specific function in user.ts.

The Clumsiness

The Clumsiness

It brilliantly optimizes the function, perhaps changing a return value from null to undefined because that’s “best practice.”

The Result

The Result

The code compiles in that file, but silent runtime errors explode across the app because three other legacy services specifically check for if (user === null). The AI optimized a brick, but destabilized the whole “building”.

 

2. The "Frankenstein" Architecture

AI does not have a sense of aesthetic consistency or architectural history. It solves the problem directly in front of it using whatever pattern is most common in its training data today, not the pattern your team painfully agreed on three years ago.

If your team’s decisions are not documented, the model will happily produce code that is technically correct yet culturally “wrong” for your repo. Humans do this too: drop a new developer into an undocumented codebase and they will default to the patterns they already know. The difference is that AI can do this at scale and at speed.

The "Frankenstein" Architecture
The Scenario

The Scenario

You need a new UI component for a dropdown menu.

The Clumsiness

The Clumsiness

Your app uses Redux for state management and Tailwind for styling. The AI generates a component that uses React Context and inline CSS styles because it picked up a pattern from a random tutorial in a different ecosystem.

The Result

The Result

You now have a working dropdown that follows a completely different data flow and styling paradigm than the rest of the app. It technically works, yet it quietly fractures your architecture and becomes a personal source of chaos for the next human developer who has to maintain it.

 

3. The "Happy Path" Hallucination

AI is incredibly optimistic. It writes code for a world where networks never fail, users never paste emojis into number fields, and databases are always awake and polite.

The "Happy Path" Hallucination
The Scenario

The Scenario

“Write a function to fetch data from this API.”

The Clumsiness

The Clumsiness

The AI produces a clean fetch request. It forgets to handle 401 (Unauthorized), 429 (Rate Limited), or 500 (Server Error). It happily parses the JSON without checking whether the schema still matches what the UI expects.

The Result

The Result

The feature ships fast. Two days later, the API changes one field name or the network hiccups, and the entire application white-screens because there was no error boundary, no fallback logic, and no graceful degradation.

This is where discipline and specificity matter. Clean, concise requirements and explicit instructions are helpful for humans, but for your AI assistant they are non-negotiable. The model cannot guess. It is intelligent, but it has no gut feeling, no lived experience of “this will probably fail in production.”

To avoid code that only works in a fairy-tale environment, you need a senior human professional to do the unglamorous part: define what “correct” actually means, describe the unhappy paths, and insist that the implementation respects them. AI can move fast on the keyboard. The responsible human has to decide what “fast” is allowed to produce.

4. The "Library Re-Invention"

AI is trained on general code patterns, so unless you tell it what your repo already standardizes on, it will happily reinvent utilities you already have. But breathe out: this one is fixable. Document your preferred libraries and helpers, and enforce them in code review – exactly the way you’d guide a new engineer.

The "Library Re-Invention"
The Scenario

The Scenario

You need to format a date.

The Clumsiness

The Clumsiness

The AI writes a custom 20-line regex function to parse and format the date string. It works... mostly.

The Result

The Result

It totally ignored that your project already imports date-fns or Moment.js and has a global helper file dateUtils.ts that handles timezones correctly. You now have duplicate code that behaves slightly differently than the rest of the app (e.g., handling Leap Years differently), creating subtle data discrepancies.

 

5. The “Security Gaslighting”

AI is helpful to a fault. If you ask it to do something risky, it usually won’t push back—it’ll just do it efficiently and with a smile. If you say, “I need to bypass this login for testing,” the model won’t warn you about the implications; it’ll just write the middleware to hard-code that bypass. It might even offer "helpful" suggestions like storing admin API keys in the client-side code or disabling validation "just for now."

The “Security Gaslighting”

The core issue is simple: AI has no concept of a threat model. It’s optimized for code that runs, not code that’s safe. You end up with functional snippets that quietly open the door to massive vulnerabilities—leaked secrets, broken authorization, or new SQL injection points. The model isn’t looking for integrity or compliance; it’s just looking for a successful execution.

This becomes dangerous in two very different ways, depending on who’s behind the keyboard:

For the "Vibe Coder" or non-pro:

AI feels like a brilliant shortcut that unblocks progress. If you don't have a deep background in security or OWASP standards, you’re likely to trust the model more than your own gut. A login bypass looks like a harmless "life hack" to get the job done, so the suggestion gets copied verbatim. In this scenario, AI is  actually training the dev to adopt insecure habits as the default.

For the Senior Engineer:

The risk here is more subtle. You know perfectly well that hard-coding keys is a cardinal sin. But maybe it’s 4:30 PM on a Friday, you’re under pressure to fix a staging bug, and you’re "just trying something out." AI makes the "dirty shortcut" frictionless. What used to be a 15-minute manual hack (one that felt a little wrong to type out) is now a five-second copy-paste. The AI isn't inventing new ways to be reckless; it’s just lowering the barrier to doing the wrong thing. It wraps a bad idea in clean, confident code that’s one missed "cleanup" task away from ending up in production.

 

6. The Circular Bug Fix (The "Whack-a-Mole")

This is the most frustrating form of clumsiness.

The Circular Bug Fix
The Scenario

The Scenario

You paste an error log: "Fix this Type Error."

The Clumsiness

The Clumsiness

The AI fixes the Type Error by casting the variable as any or adding a // @ts-ignore.

The Result

The Result

The error goes away. You run the app. A logic bug appears. You ask AI to fix the logic bug. It fixes the logic but re-introduces the Type Error. You can spend 30 minutes in a loop where the AI oscillates between two broken states, unable to hold both constraints in its "mind" simultaneously.

Well, maybe you and your team have even more of these cases? However, while approaching AI-assisted coding it is better to keep in mind, that AI "clumsiness" comes from the fact that AI is a tactician, not a strategist.

The productivity paradox arises because humans spend less time typing syntax (the tactical part) but significantly more time reviewing, auditing, and integrating that syntax into the messy reality of the business (the strategic part).

For an executive audience, the question is no longer “Should my engineers use AI?” It’s “Where does AI actually reduce task completion latency in my stack? As well as “Where is it sneaky creating AI-generated technical debt while everyone feels so productive and comfortable?”

Now, let`s fall deeper into this rabbit hole and research some more “clumsy” AI cases.

Greenfield vs. Brownfield: Two Very Different AI Realities

In greenfield contexts, AI is everything the marketing decks claim.

Give a modern assistant a clean repo and a focused request: 

  • “spin up a FastAPI endpoint”,
  • “generate Jest tests for this pure function”,
  • “create a GitHub Actions pipeline similar to this one”.


What will you see? AI that behaves like an ultra-fast junior engineer. Studies show 20–60% gains on tightly scoped coding tasks: the assistant annihilates boilerplate and gets your team to a working first draft in minutes, not hours.

You feel that in the room: engineers are less drained by repetitive work. Onboarding velocity improves because a context-aware coding agent can walk a new hire through calling patterns, logging conventions, or error-handling idioms embedded in the codebase. The Return on AI (ROAI) in these scenarios is obvious and positive.

But that’s only one reality.

Most of your real spending is not on greenfield. It’s on brownfield work: altering behavior in a 10-year-old platform, threading new capabilities through legacy systems, respecting undocumented invariants that live in the heads of two architects and three ex-employees.

This is where the glossy story breaks.

A recent randomized controlled trial from METR dropped early-2025 AI tools (Cursor Pro and top-tier language models) into that exact environment: 16 experienced open-source developers, 246 issues, all on mature projects they knew intimately. Five years of prior contribution history, tens of thousands of stars per repo, to add to the picture.

Before the experiment, developers forecast a 24% speedup from AI. After the experiment, they still believed they’d been about 20% faster.

The data said something else:

When AI tools were allowed, issue completion time increased by 19%. Tasks took longer with AI than without it.

That gap that we see, between subjective acceleration and objective slowdown, is what could be called the Phantom Speedup. Less typing, more chatting, more “flow,” and a longer path to “ready for production.”

For senior leaders, that’s the first uncomfortable truth: AI can absolutely improve the experience of coding while degrading the economics of shipping.

The AI Productivity Paradox Starts in the Mind, Not the Model

Why does this happen?

The METR team dug through 140+ hours of screen recordings and found a central culprit: over-optimism. Developers believed AI help was, by definition, a net positive. So they kept reaching for it, even in situations where it obviously wasn’t saving time.

This is where vibe coding creeps in, a term coined by Andrej Karpathy, and widely used by Eric Bridgeford and colleagues in their “Ten Simple Rules for AI-Assisted Coding in Science.” You see code that looks right. It compiles. It follows familiar patterns from Stack Overflow and GitHub. It “vibes” as correct. But the developer has not truly interrogated the logic, edge cases, or methodological soundness.

In other words: AI is extremely good at generating correct-looking code. The pressure is then on the human to ensure correctness and completeness, not just syntactic validity. When that discipline softens, you don’t just get slower. You get slower and riskier.

Bridgeford’s work adds a second psychological warning: cognitive offloading vs. atrophy. AI is genuinely useful for lifting cognitive load: less time recalling API signatures, more time thinking about business logic. BUT. But if your organization leans too hard on the model to do problem decomposition, architecture decisions, or algorithm design, those muscles weaken. Over time, you get engineers who are excellent prompt operators and increasingly shaky systems thinkers.

That may feel fine in a sprint. It is catastrophic over a product’s lifetime.

Comments