Why AI Coding Tools Pay Off for Senior Engineers (and Quietly Underperform for Everyone Else)

Written by

Andrew Manshin

on June 1, 2026

What the latest research says about AI productivity, the kind of work where it pays off, and a recent modernization project that demonstrates the pattern. [Updated: June 5, 2026]

The paradox at the heart of the AI coding boom

Adoption of AI coding tools has reached near-universal levels. The Stack Overflow 2025 Developer Survey, released late last year and treated across the industry as the 2026 reference, found that 84 percent of developers are using or plan to use AI tools, with 51 percent of professionals using them daily.

Trust is moving in the opposite direction. The same survey found that developer trust in AI accuracy fell from 40 percent in prior years to 29 percent. Experienced developers were the most skeptical of all, with only 2.6 percent reporting high trust in AI output and 20 percent reporting high distrust.

Both of those statements are true at the same time. The most-used class of developer tools in history is also the least-trusted by the people who use it most. And yet the same period has produced some of the most striking modernization results the industry has ever measured.

The productivity gain from AI coding tools is real. It does not look the same at every level of experience, and the shape of the gain matters more than the size.

What the numbers actually say

The most useful piece of recent analysis I have read on this topic is Ingo Eichhorst's State of AI Coding Efficiency (2026), a meta-analysis of studies from late 2025 and early 2026. The headline result is from a Microsoft study of GitHub Copilot users: an average productivity increase of about 26 percent in weekly completed tasks.

The interesting part is the distribution. The 26 percent average splits into roughly 40 percent for junior developers and about 7 percent for senior developers on pull-request volume. A naive reading of that data says juniors benefit and seniors do not. The actual reading is that the benefit changes shape with experience.

Junior developers get more out of assistant-style tools: autocomplete, snippet generation, syntax help, boilerplate. Their work has a higher concentration of mechanical tasks, and AI handles those well.

Andrew and Olena reviewing AI tool work process and productivity gains

Senior developers benefit more from AI workflows that require evaluation, steering, and system-level judgment. The work is different. It is harder to measure in pull requests because it lives in decisions, in trade-offs, and in what does not get built rather than in lines of code merged.

There is one study worth confronting. The METR randomized controlled trial, published in mid-2025, found that experienced developers working on familiar open-source codebases were 19 percent slower when allowed to use AI, even though they believed they were 20 percent faster. The slowdown was specifically in the autocomplete-style usage pattern that does not suit senior work in familiar code. Olena, our CEO, wrote about what this means for software clients in The Real Cost of AI-Generated Code.

Productivity is only half the question. The other half is what happens to AI-written code months or years later, when someone else has to maintain it. A 2025 study, updated in early 2026, addressed exactly that. Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability, led by researchers at CodeScene and Lund University, ran a two-phase pre-registered controlled experiment with 151 participants, 95 percent of them professional developers. In Phase 1, developers added a feature to a Java web application, with or without AI. In Phase 2, a different set of developers evolved those solutions, with no AI allowed.

The maintenance phase showed no significant difference in completion time or code quality between AI-assisted and human-only code. Where the study found a difference, it favoured AI: code written by habitual AI users scored measurably higher on maintainability. AI is not degrading the codebase. The senior practitioner is the variable that matters.

The headline result: no significant difference in maintenance cost or code quality between AI-assisted and human-only code. Where the study found a difference, it favoured AI: when habitual AI users completed Phase 1, the maintainability score of their code was significantly higher. The senior practitioner is the variable that matters.

The studies measure different outcomes and converge on the same finding. AI does not determine engineering outcomes on its own. The practitioner using it does. Speed, productivity by experience, long-term code quality: all of them follow from the skill and judgment of the human in the loop.

The two kinds of complexity

To understand why the productivity gain takes the shape it does, it helps to draw a distinction that has been around in software engineering for decades but is suddenly very useful again.

Software has two kinds of complexity. The first is accidental complexity. This is everything that comes from the implementation: syntax, boilerplate, build configuration, framework conventions, dependency wrangling, the work of keeping libraries on supported versions. AI commodifies accidental complexity. A junior developer with a good AI assistant can plough through it at a multiple of what they could before.

The second is inherent complexity. This is what the system actually does and why. What business problem is the application solving. What is the rule that lives in the head of an engineer who left two years ago. What will break if you change this constant. Why the original architect made a decision that looks wrong now but is in fact load-bearing.

Inherent complexity does not yield to autocomplete. It yields to judgment, and that judgment is built from years of working on production systems and watching what happens when assumptions turn out to be wrong.

Modernization work is mostly inherent complexity. That is why the productivity gain looks the way it does for senior engineers. It is also why AI is a force that raises rather than lowers the engineering standard for software built for serious purposes. Olena made the full version of that argument in Small Audience, Big Standards: Why Niche Software Demands Senior Engineering.

What this looks like on a real modernization

We are running a modernization project right now that demonstrates the pattern. Our client is a long-standing research and data platform built originally for the University of British Columbia and used daily by researchers around the world. The application has been in production for years and carries the architectural layers of every era it has lived through. I led the latest phase of work. The full technical narrative is on the updated DRH case study.

The work splits cleanly along the two-kinds-of-complexity line.

What AI did well, directed by senior judgment. Three examples from the actual project.

The mechanical conversion of every component from the old create-react-class API to modern React function components, across the entire codebase. This is pattern-translation work. It happened in passes that would have taken weeks of careful manual editing and instead took days. AI is consistently good at this kind of task under senior review.

The dependency cleanup. jQuery, Bootstrap, the legacy d3 wrapper, the bespoke form-tools library, the old yup package, multiple stale axios wrappers, orphaned types directories. The decision to remove each dependency was mine. The work of finding and rewriting every reference was AI-assisted. Pattern recognition at scale, performed quickly, verified by a human who knew what should and should not break.

The TypeScript adoption and Hasura codegen pipeline. Type generation from a GraphQL schema is exactly the kind of task AI handles cleanly under a senior engineer who knows what end-to-end type safety should look like.

What only senior judgment could do. Three contrasting examples from the same project.

Removing Baobab as the global state container. Baobab is a relatively obscure observable-tree library that was a reasonable choice when the application was built. Each piece of state on the global tree had to be evaluated independently. Was it local component state, or cross-cutting context, or server data that belonged in TanStack Query, or derived state that should be computed from the server? AI can translate code. It cannot make that call. A junior engineer with AI would have produced a working but architecturally incoherent result. I made each decision per feature, and the resulting architecture is one a future maintainer can actually reason about.

Choosing where to introduce GraphQL. Reads moved to a Hasura layer. Writes remained on the existing Django REST API. This is the strangler pattern applied as a deliberate architectural choice. The migration is not blocked on a full GraphQL rewrite. The application kept shipping features throughout. I drew the line, and the line is the right one for this system at this stage of its life. The lean-team, senior-led version of this workflow is described in more detail in App Development With Claude Code.

The StrictMode work. React 18 StrictMode intentionally surfaces latent bugs by double-mounting components during development. Turning it on revealed a memory leak in the Google Maps and d3 overlay teardown path in the Visualize module. AI helped enumerate the suspect surfaces and propose fixes. The actual diagnosis required understanding both React's lifecycle and the browser's rendering pipeline. This is the "almost right" problem in microcosm. AI's suggestions were directionally useful but not fully correct. The diagnosis was mine.

Two-column comparison summarizing what AI accelerated versus what only senior judgment could do on the DRH modernization. Left column, AI-accelerated work: component API conversion, dependency cleanup, TypeScript codegen, styling migration, and form library migration. Right column, senior judgment work: Baobab to modern state, GraphQL boundary line, StrictMode diagnosis, architecture restructure, and multi-step form pattern.

None of this work was a rewrite. The application ran in production throughout. Every change shipped on its own. The modernization is happening continuously, not as a discrete event.

What made it work

Three conditions distinguish modernization that produces results from modernization that produces fast, confident chaos.

The first is senior direction at the architecture layer. AI executed inside an envelope that someone with deep judgment had defined. Without that envelope, the same tools produce code that compiles, passes tests, and is structurally wrong. The Thoughtworks team reported a similar conclusion in their Angular 2.2 modernization case, where they used Claude Code to compress a six-month estimate into a six-week delivery. Their framing was that AI tools are powerful assets provided skilled humans guide them. The pattern is the same.

The second is guardrails. AI produces better output as the surrounding constraints get tighter. Implicit guardrails are the patterns and documentation that already exist in the codebase. Explicit guardrails are quality gates the output must satisfy to be accepted: tests, linters, type checks, formatting. We added Prettier and ESLint gates on every commit and introduced TypeScript progressively. AI did better work as each gate came online.

The third is continuity-first sequencing. The modernization is incremental. Reads moved to Hasura before writes. Forms migrated one feature at a time. Baobab was removed feature by feature. There was never a point at which the system was half-broken. This is not a constraint AI imposes. It is a discipline senior engineers bring to the work, and AI accelerates it.

The productivity gain is real, but it lives in the gap between the AI's output and the architecture it serves. Senior engineers are what fills that gap.

Why this favours specialists

Recent research suggests AI coding tools require on the order of 30 to 100 hours of hands-on use before they consistently help the developer using them. Below that threshold, they often hurt productivity. Above it, the gains compound.

That has an implication most teams have not thought through. An internal engineering team taking on AI-accelerated modernization for the first time will absorb the cost of the learning curve on the client's own codebase. A specialist team that has already invested those hours arrives ready, with patterns refined across multiple production systems. Olena made a related argument about lean, senior teams in Fewer People, Better Results: What Research Reveals About Team Size and AI.

The Amplifier

There is a simpler way to put it, borrowed from Dave Farley, one of the Echoes of AI researchers: AI is an amplifier.

If a team is already doing the right things, AI amplifies the impact of those things. If a team is doing the wrong things, AI helps them dig a deeper hole faster. The DRH project went well because the practices being amplified were already good ones.

Two failure modes

Two failure modes are worth naming.

Code bloat: AI makes generating code nearly free, and volume drives complexity. A team that lets AI generate as much code as it can will end up modernizing the same legacy patterns again in two years.

Cognitive debt: if developers stop thinking about the code they ship, understanding erodes, skills atrophy, and the team becomes dependent on the tool rather than capable with it.

What to take away

Three points worth carrying into the next leadership conversation.

When evaluating AI's effect on engineering, ask what kind of work and what kind of developer. The aggregate numbers obscure more than they reveal. A team doing mostly accidental-complexity work will see headline gains. A team doing mostly inherent-complexity work, like modernization, will see the gain appear in different places: fewer dead-ends, faster diagnoses, more options explored before a path is chosen.

A parallel point applies to how AI output is evaluated: less experienced developers tend to accept it as authoritative, while experienced engineers spot the gaps.

Modernization is where AI has produced the most measurable economic value of any application in software engineering, but only when used inside a disciplined process by senior engineers. The DRH project demonstrates this concretely. Ad-hoc adoption fails. Systematic integration succeeds.

The question is no longer whether to use AI tools in modernization. It is who is directing them. The wrong hands turn AI into a multiplier on existing mistakes. The right hands turn AI into a way to take on work that previously was not economical to attempt.

Andrew Manshin

Pieoneers CTO