The Real Cost of AI-Generated Code

Written by

Olena Tkhorovska

on March 22, 2026

I have this conversation regularly with clients and prospects.

Someone comes to me with a working prototype. They built it themselves, or their product team did, using tools like Lovable or Claude Code. Sometimes in a weekend. Sometimes in a few weeks. In one recent case, a family law firm had a fully functional intake and document management tool. In another, a finance startup had a working personal budgeting app with user authentication and data visualization. A lifestyle and wellness brand had built a coaching platform with subscription logic and a content library.

The products worked. They looked good. Users were logging in.

Then the same question came up every time: how do we build the real thing?

By "the real thing," they meant something that could handle more users without breaking, pass a security review, be maintained by a developer who did not build it, and exist in two years without requiring a full rewrite. They meant production software.

That gap between a working prototype and a production-grade system is what this post is about. It has become one of the most important conversations in software development right now, because AI has made the first part dramatically easier while leaving the second part exactly as hard as it has always been.

What AI Changes About Software Development — and What It Does Not

Before examining the evidence, it is worth being precise about the claim.

AI coding tools are genuinely capable. Not in a few years. Now. In January 2026, Anthropic CEO Dario Amodei told the World Economic Forum that AI could do "most, maybe all" of what software engineers do end-to-end within six to twelve months. Engineers at his own company, he noted, had already stopped writing code themselves. They described the process as letting the model write the code and editing the result.

This is real. The tools have crossed a meaningful threshold in the past eighteen months. Prototypes that used to take weeks take days. Boilerplate that consumed engineering hours is generated correctly on the first attempt. Platforms like Lovable, Claude Code, and Cursor have made it possible for non-technical founders to ship working software without an engineering team.

But Amodei also said something that received far less attention. Speaking at the same forum, he added: "There are still things for the software engineers to do. It is like even if the software engineers are only doing 10% of it, they still have a job to do, or they can take a level up. But that is not going to last forever."

That qualifier matters enormously for anyone making decisions about how to build software today. The question is not whether AI can write code. It clearly can. The question is what the code needs to do, who is responsible for it, and what it costs to maintain over time.

The Entry-Level Hiring Decline: What the Data Shows

The effect of AI on the software labor market is no longer speculative. It is measured.

Stanford's Digital Economy Lab published a landmark study in 2025 tracking payroll records for millions of workers across tens of thousands of firms. The finding was specific: employment for software developers aged 22 to 25 declined nearly 20% from its late-2022 peak. The controlled figure across all AI-exposed occupations was a 13 to 16% relative decline, and it was still growing at the time of publication. The researchers titled the paper "Canaries in the Coal Mine."

At the same time, hiring of new graduates at the 15 largest tech companies has fallen by more than half since 2019, according to data from venture capital firm SignalFire.

The interpretation most people reach: AI is replacing entry-level work. That is partly true. But there is a second implication that receives almost no attention.

Junior engineering roles were never only about output. They were how the industry produced experienced engineers. The developer who understands a complex distributed system at 40 was a developer fixing small bugs and writing routine endpoints at 25. That path built the pattern recognition, the failure memory, the domain instinct that makes experienced engineering judgment valuable.

If AI absorbs the work that built those capabilities, the industry is not just solving a short-term efficiency problem. It is borrowing against future capability without acknowledging the debt. The engineers who will be needed to oversee complex systems in five years are being produced in fewer numbers right now.

Why Productivity Gains and Quality Problems Are Happening at the Same Time

This is the data point that most organizations miss.

Cortex surveyed over 50 engineering leaders and analyzed development metrics across multiple organizations for their 2026 Engineering in the Age of AI Benchmark Report. The findings on speed are exactly what most leaders expect: pull requests per author are up 20% year-over-year, and deployment frequency is up across the board.

The findings on quality are what most leaders do not expect: incidents per pull request increased by 23.5%, and change failure rates rose approximately 30%.

Both sets of numbers are real. They exist simultaneously. AI tools are making teams faster and making mistakes more frequent.

This is not a paradox once you understand the mechanism. AI amplifies whatever engineering practices are already in place. On a team with strong architecture, clear standards, and experienced reviewers, AI acceleration compounds those strengths. On a team without those foundations, AI produces more code faster, and more of that code contains the kinds of errors that experienced judgment would have caught before they reached production.

A separate survey from Karat, covering 400 engineering leaders across the US, India, and China, found that 73% consider strong engineers worth at least three times their total compensation. That multiple has grown since AI tools arrived. At the same time, 59% say weak engineers now deliver zero or negative value in an AI-assisted workflow.

The value gap between strong and weak engineering is not closing. It is widening. AI does not level the playing field. It raises the stakes on the judgment already present.

Where AI Is Replacing Engineering Work Today

There are genuine categories of software work where AI is not augmenting engineers. It is replacing them. Being clear about this matters, because denying it does not serve anyone making real decisions.

AI is producing most of the work in these areas today:

Greenfield projects with no legacy context. When the codebase is new, the requirements are clear, and there is no accumulated business logic to understand, AI can build a first working version with minimal human input. This is the Lovable and Claude Code use case. It is real and it works.

Boilerplate, scaffolding, and standard endpoints. The code that connects components, handles routine data operations, and implements known patterns is largely automatable. This was a significant portion of entry-level engineering work.

Standard front-end development. Job postings in this category declined more than any other engineering category in 2025. AI tools generate UI components, handle state management patterns, and implement common interface logic competently.

Test generation for well-specified behavior. When the expected behavior is clear, AI generates test coverage faster and more comprehensively than manual test writing.

Solo and small-scope projects. One developer with AI tools now produces the output that previously required a small team, within a narrow and well-defined scope.

The honest assessment is that a significant portion of what junior engineers did in 2022 is now handled by AI in 2026. The economics have changed for this class of work. That is not a prediction. It is the current state.

Where Human Engineering Judgment Still Determines the Outcome

The component of software engineering that AI is automating is the translation layer. Converting a decided solution into working syntax. That was never where the expensive decisions lived.

In 1985, computer scientist Peter Naur wrote an essay called "Programming as Theory Building." His argument was that every program embeds a theory held in the minds of the people who built it. That theory covers why the system was designed this way, what trade-offs were made, what the business actually needed versus what was specified, and where the failure modes are. When the team that holds that theory leaves, the theory goes with them. No amount of documentation fully replaces it, because the theory is not just facts. It is judgment accumulated through decisions and their consequences.

AI produces code. It does not produce the theory behind the code.

This distinction matters most in the following situations:

Brownfield and legacy systems. Decades of undocumented decisions, technical debt, and business logic that lives in the codebase rather than any specification. AI tools struggle with large, complex codebases where the context window cannot contain the history of how the system evolved. The engineers who understand that history are the ones who can change it safely.

Production-grade security. A 2025 analysis of AI-generated versus human-written code, reported by The Register, found that AI code contained 2.74 times more XSS vulnerabilities, 1.91 times more insecure object references, and 1.88 times more improper password handling. AI confidently reuses insecure patterns because it has learned from code that contains them. It does not have a threat model. Experienced engineers do.

Regulated industries. Fintech, health technology, legal software, and government systems operate under compliance requirements that are not reducible to code patterns. Understanding what the regulation requires, how it applies to a specific business model, and where the liability sits requires human judgment and domain knowledge.

Complex distributed systems. Failure modes in distributed architectures are emergent and context-specific. AI can generate the components. Experienced engineers design how those components behave under stress, partial failure, and unexpected load.

Long-lived client relationships. Years of accumulated understanding about what a client actually needs versus what they asked for, why a system was built a particular way, and what the business context is for a technical decision. This is irreproducible by any tool.

This is not a sentimental argument for human involvement. It is a practical one. A survey by Harness found that 67% of developers spend more time debugging AI-generated code, and 68% spend more time reviewing it than human-written code. Reviewing code you did not write, from a system that did not explain its decisions, requires more judgment, not less.

Why Small Senior Teams Have a Structural Advantage Now

Here is the shift that most organizations have not fully priced in.

A team of three or four experienced engineers with AI tools can now produce the output that previously required twelve to fifteen. Opsera's 2026 benchmark, covering over 250,000 developers, found that senior engineers realize nearly five times the productivity gains of junior engineers when using AI tools. The gap is not small.

The quality of that output depends entirely on the judgment of those few people. The Cortex data makes the risk concrete. Speed gains without experienced engineering judgment produce more failures, not fewer. The productivity multiplier only works when the engineers directing it understand what they are building.

Large delivery organizations that use headcount as a proxy for quality are now in an exposed position. They have more people, more output, and more incidents. They have less of the scarce thing: engineers who understand the system deeply enough to know when AI-generated code is right and when it is plausible-looking but wrong.

A small team that knows a client's domain well, deploys AI to handle the mechanical implementation layer, and takes genuine accountability for the architecture and quality of what they build is not competing on volume. It is competing on judgment. That is the advantage that compounds over time.

The economics favor this configuration in a way they did not three years ago. The question for any organization commissioning software is not how many engineers are working on a project. It is how much genuine engineering judgment is applied to what is being built.

Vibe Coding Is the New Technical Debt

There is a name now for the approach of describing what you want in plain language and accepting AI output with minimal review. It is called vibe coding. The term was coined by Andrej Karpathy, co-founder of OpenAI, in early 2025. It went mainstream quickly because it accurately described something that a large number of people were already doing.

For prototypes, internal tools, and bounded greenfield projects, vibe coding is a reasonable choice. The time savings are real and the scope is contained.

For production systems, it is creating a problem that has not fully arrived yet.

A study tracking Cursor AI adoption in real development teams found static analysis warnings up 30% and code complexity up 41% after a single month of AI-assisted development. Gains in delivery speed dissipated. The technical debt persisted. One developer documented reaching 100,000 lines of AI-generated code before the codebase became effectively unmanageable. Not because the code did not work, but because nobody understood it well enough to change it safely.

The Cortex benchmark data tells the same story at scale. Speed up 20%. Failures up 30%.

What is being created, in large numbers, right now, is production software that functions at launch and accumulates hidden fragility over time. Code without the theory behind it. Systems that work but that nobody fully understands.

This is the structural problem with vibe coding applied beyond its appropriate scope. It is not that the code is wrong. It is that the code has no author in the meaningful sense. No person who holds the mental model of why it was built this way, what it assumes, and where it will break under conditions that were not anticipated at the time of generation.

In three to five years, organizations running these systems will face a remediation problem. Understanding what the system does, why it does it, and what can safely be changed will require exactly the kind of experienced engineering judgment that the entry-level hiring collapse is currently preventing the next generation from developing.

The demand for that judgment is not declining. It is deferred.

The organizations that will be well-positioned when that wave arrives are the ones that built their software with experienced engineers involved in the architecture decisions from the beginning. Not added later to fix what AI built without sufficient oversight.

What This Means for How You Build Software

If you are making decisions about how to build or commission software in 2026, here is the practical picture.

AI tools are worth using. A skilled engineering team using them well produces more, faster. This is not optional for competitive delivery.

The prototype-to-production gap is real. Building a working prototype with AI tools is now accessible to almost anyone. Building a system that can handle growth, survive a security review, be maintained by a team that did not build it, and run reliably in two years requires something different. It requires architecture decisions made by people who understand the domain, the system, and the long-term implications of the choices made at the start.

The cost of skipping that step is deferred, not avoided. Systems built without experienced engineering judgment at the architecture level do not fail immediately. They accumulate fragility. The rewrite or the remediation project arrives later, when the business depends on the system and the cost of fixing it is much higher than the cost of building it right the first time.

Team configuration matters more than team size. The relevant question is not how many engineers are working on a project. It is how much genuine engineering judgment is applied to the architecture, the security model, the data design, and the long-term maintainability of what is being built.

Proximity to the business context is the multiplier. The engineers who produce the most value in an AI-assisted workflow are the ones who understand the client's domain, the regulatory environment, the business constraints, and the product strategy well enough to direct AI tools effectively and evaluate their output critically. That understanding comes from close, sustained engagement with the problem.

The conversation I have with clients who come to me after building a prototype on Lovable or Claude Code is almost always the same. The prototype worked. It proved the concept. Now they need to build the real thing.

That is the right sequence. Use the tools available to prove the idea quickly and cheaply. Then bring in the engineering judgment required to build something that will last.

What the current AI moment has changed is the cost of the first step. It has not changed the cost of the second.

Olena Tkhorovska is the CEO and Co-Founder of Pieoneers, a Vancouver-based software development firm that has been building production-grade web and mobile applications since 2009. Pieoneers works with small, senior-led teams on projects where architecture, security, and long-term maintainability matter. If you are navigating the transition from prototype to production system, reach out at pieoneers.com/contact/.

References

1. Dario Amodei at the World Economic Forum, Davos Statement on AI replacing software engineers within 6 to 12 months, and follow-up quote on engineers still having a role. January 20, 2026. Interview with The Economist at WEF Annual Meeting 2026. Reported by: Entrepreneur (January 21, 2026), BW Businessworld, Rest of World (January 28, 2026)

2. Stanford Digital Economy Lab — "Canaries in the Coal Mine?" Brynjolfsson, E., Chandar, B., and Chen, R. (2025) Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence. Stanford Digital Economy Lab. Published August 2025, updated November 2025. digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/

3. SignalFire — State of Talent Report 2025 New graduate hiring at the 15 largest tech companies has fallen by more than half since 2019. signalfire.com/blog/signalfire-state-of-talent-report-2025 Also reported by SF Standard (January 28, 2026): sfstandard.com/2026/01/28/ai-booming-tech-jobs-san-francisco/

4. Cortex — Engineering in the Age of AI: 2026 Benchmark Report Survey of 50+ engineering leaders. PRs per author up 20%, incidents per pull request up 23.5%, change failure rates up approximately 30%. cortex.io/post/ai-is-making-engineering-faster-but-not-better-state-of-ai-benchmark-2026 Also confirmed by The Register (December 2025): theregister.com/2025/12/17/ai_code_bugs/

5. Karat — AI Workforce Transformation Report 2026 Survey of 400 engineering leaders across the US, India, and China. 73% say strong engineers are worth at least 3× total compensation. 59% say weak engineers deliver zero or negative value. karat.com/resource/ai-workforce-transformation-report/ Also reported by GeekWire (December 2025): geekwire.com

6. Peter Naur — "Programming as Theory Building" Essay originally published 1985. Reprinted in: Naur, P. (1992). Computing: A Human Activity. ACM Press. Cited in ISACA Now Blog (January 26, 2026): isaca.org/resources/news-and-trends/isaca-now-blog/2026/can-artificial-intelligence-replace-software-engineers-and-cybersecurity-professionals

7. CodeRabbit — State of AI vs. Human Code Generation Report (2025) AI-generated code contained 2.74× more XSS vulnerabilities, 1.91× more insecure object references, and 1.88× more improper password handling than human-written code. Reported by The Register (December 2025): theregister.com/2025/12/17/ai_code_bugs/

8. Harness Developer Survey 67% of developers spend more time debugging AI-generated code. 68% spend more time reviewing it than human-written code. Cited in Ivan Turkovic, "AI Made Writing Code Easier. It Made Being an Engineer Harder." (February 25, 2026): ivanturkovic.com/2026/02/25/ai-made-writing-code-easier-engineering-harder/

9. Opsera — 2026 AI Coding Impact Benchmark Analysis of 250,000+ developers. Senior engineers realize nearly five times the productivity gains of junior engineers when using AI tools. Cited in multiple 2026 sources including: cjroth.com/blog/2026-02-18-building-an-elite-engineering-culture

10. Cursor AI Adoption Study Static analysis warnings up 30%, code complexity up 41% after one month of AI-assisted development. Gains in velocity dissipated; technical debt persisted. "Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects." Cited in: agilepainrelief.com/blog/ai-generated-code-quality-problems/ (February 2026)

11. Andrej Karpathy — "Vibe Coding" Term coined by Andrej Karpathy, co-founder of OpenAI, in early 2025. Reported by MIT Technology Review (January 2026): technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/

Olena Tkhorovska

CEO & Co-Founder, Pieoneers