John Epperson 7/3/26 John Epperson 7/3/26

The U-Curve of AI Amplification

AI amplifies juniors and seniors most; mid-level engineers gain the least. A counterintuitive shape with real implications for how you hire.

Here's an observation I keep coming back to, held as a strong opinion loosely: the value AI provides to a developer at work isn't distributed evenly across experience levels. It's not proportional to skill. It's shaped more like a U.

Junior developers get enormous value. Mid-level developers get modest value. Senior developers get enormous value again. If you plot the value against career stage, the curve dips in the middle.

That's counterintuitive. Most people assume value scales linearly with skill, or that AI democratizes expertise so it flattens the curve. I don't think either is right, and this piece is my attempt to lay out why. The stakes are practical: if the shape of the curve is real, it changes how you should hire, how you should structure teams, and where you should invest training budget.

One caveat up front, and I want to keep saying it as the piece goes on: everything below is grounded in observation, not empirical research - though there is some research that corroborates it and some research that I'm not sure what it means yet. I'm speaking from the standpoint of someone who has watched a lot of engineers work with AI over the last two years (especially the last 6 months) across multiple codebases, not with the authority of a controlled study. When I say "juniors get more value" or "mid-level engineers plateau," I'm generalizing from cases I've watched play out. More evidence one way or another is needed. Treat this as an operating theory to test, not a claim to plan around.

What juniors get

For someone new to the field or new to a technology, AI closes knowledge gaps that used to be closed by mentorship, by Stack Overflow, by six months of pattern-matching on someone else's code. That closure used to be expensive. It's now cheap.

A junior developer with a good AI setup can ask "what's the idiomatic Rails way to do this" and get a decent answer in seconds. They can ask "why is this test failing" and get an explanation that would have taken hours to find. They can ask "explain this codebase to me" and get a map. The AI is functionally a patient guide, available 24/7, without the social cost of asking someone senior "the same question again."

The failure mode for juniors is uncritical acceptance. The AI is often wrong, and the junior lacks the intuition to catch the wrong parts. Codebases can drift into plausible-but-wrong architectures if a junior takes AI output as gospel. This is a real risk, and it's why "let juniors use AI unsupervised" is a mistake in most organizations.

But the ceiling for value at this tier is high. If the failure mode gets managed (through code review, through paired seniors, through cultural norms that treat AI as a first-draft tool), juniors go from "learning slowly" to "learning at a pace that would have taken years." That's a real transformation, and it's what most of the "AI democratizes coding" claims are pointing at when they're not overselling.

What mid-level engineers get

Here's the part that surprises people: mid-level engineers get the least from AI.

Not zero. But less than either end of the curve.

The reason is that mid-level engineers are in an awkward spot. They have enough experience that they don't need the "here's how Rails works" guidance a junior gets. But they don't yet have the deep intuition that lets a senior spot a smell and articulate it in a way the AI can amplify. Mid-level engineers use AI mostly as a rubber duck, a colleague to talk problems through. That's useful. But the leverage is capped at their own judgment level.

Concretely: a mid-level engineer will accept the AI's first-pass output more often than a senior will. Not because they're lazy, but because they don't yet have the calibrated dissatisfaction sense that says "this passes tests but the shape is still wrong." They ship the passable version. The output looks like solo work, just faster.

I watched this play out recently on a project where a mid-level engineer was working on a complex feature. They'd hand a problem to the AI, get a solution, apply it, test it, and move on. On paper, they were productive. In practice, they were spending a lot of cycles going in circles. The AI would produce a solution that had a subtle design flaw. The engineer would apply it. A day later, the flaw would surface as an edge-case bug. They'd hand the bug to the AI, get a patch, apply the patch, move on. A day later, the patch would create a different subtle problem. The engineer was coding themselves into corners they didn't realize they were building, because the AI wasn't telling them "the shape of what you're doing is off." It was just answering the immediate question they asked.

A senior looking at the same starting problem would have caught the design flaw before the first solution shipped. Or would have pushed back on the AI's first pass and asked for alternatives. The mid-level engineer didn't know to do either, and the AI didn't do it for them. So the same mid-level engineer who used to write reasonable code slowly was now writing reasonable-looking code faster, and the underlying design was drifting further from correct every day.

Faster is not nothing. But it's less than the transformation the tier above and the tier below get. And in the pathological case it's actually negative. The engineer is producing more code, more of it is subtly wrong, and the total time-to-a-correct-solution is longer than it would have been without the AI.

There's also a separate risk at this tier that's worth naming. Mid-level engineers who use AI heavily can plateau in a way they wouldn't have without it. The parts of their career where they'd naturally develop deeper intuition, the frustrating, slow, deeply-thought moments, are the moments AI is most tempting to short-circuit. Sit with the frustrating problem, work through it, feel the pattern become internalized. Or hand it to AI and get an answer. The second path is faster today. It also skips the exact experience that would have moved the engineer from mid-level to senior. That's speculative. I don't have data. But it's the failure mode I'd watch for.

What seniors get

The senior mechanism is different in kind, not just in degree.

A senior developer working on a problem has trained pattern-matching that says "something is off here" often before they can say why. That pre-verbal signal is the thing that separates senior work from mid work. The signal is right most of the time. Articulating what it's actually detecting is what takes the time.

AI compresses that articulation cost. The senior says to Claude, "this class seems to be merging arrays upward, which feels wrong." The act of naming it forces the intuition into words. Claude then proposes options, sketches alternatives, scans the codebase for supporting evidence. The senior evaluates, picks, iterates. What used to take hours (the sketching, the evidence-gathering, the tradeoff analysis) collapses into minutes.

The intuition stays in the senior's head. That's what the AI can't provide. But the throughput of investigating and acting on the intuition goes up dramatically. A senior with AI can do in a day what would have taken a week solo, at the same or higher quality.

Even seniors have failure modes with AI, though, and I want to be honest about mine. I catch large code shapes going wrong pretty reliably, I'll notice when a service class is turning into a god-object, when a controller is doing more than it should, when a domain layer is missing. That's the calibrated intuition doing its job.

But I'm less reliable at catching small smells. A method that's slightly too long. A variable name that's carrying too much semantic weight. A test that's curve-fitted to the current implementation rather than testing the behavior. Those come at me from the AI's output constantly, and my pattern is often the same: I read the code, notice the smell, feel a small friction, immediately generate a justification for why it's fine in this specific case, and move on. Some of those justifications are legitimate. The pattern really is fine in context. But some are motivated reasoning. I'm choosing not to fix the smell because fixing it would slow me down.

The seniors who get the most out of AI are the ones who've learned to notice this pattern in themselves and push through it. Fix the small smell. Refactor the slightly-too-long method. Don't accept the curve-fitted test. The productivity win from AI isn't "more code faster." It's "more code faster AND at higher quality than solo," and the higher-quality half only holds if the human keeps refusing the small smells.

Even careful code review lets things through. It always has, even before AI. What's changed with AI is the volume. Twenty PRs a week, each with a plausible-but-slightly-off pattern, adds up faster than twenty PRs a week each written from scratch by a human who was also going to make some mistakes. The percentage of subtle problems per PR might not be higher with AI; the volume of PRs is higher, so the absolute number of subtle problems reaching the codebase is higher. And once a subtle problem is in, friction increases, churn increases, and bad modeling and poor architecture compound. The surface area of things that could go wrong increases, and the future busy work to fix them accumulates.

Being senior doesn't eliminate this. It reduces it. And the reduction, over months and years, is the whole ballgame. If a senior with AI reduces the rate of new subtle problems by even a moderate percentage compared to a mid-level engineer with AI, the difference in accumulated codebase health after a year is enormous. That translates directly into productivity, the team not fighting subtle problems from six months ago is the team shipping new features today.

The GitClear signal

There's some empirical data pointing in the same direction. GitClear ran a study analyzing about 211 million lines of structured code change data from 2020 through 2024, comparing the maintainability patterns of AI-generated code against pre-AI baselines. The findings are directionally what the U-Curve theory would predict.

The proportion of new code that got revised within two weeks of its initial commit grew from 3.1% in 2020 to 5.7% in 2024. In AI-heavy projects specifically, GitClear observed roughly a 39% higher churn rate. More code was being written; more of it needed immediate revision.

Duplicated code blocks rose about eightfold in 2024 compared to previous years. Copy-pasted lines climbed from 8.3% to 12.3% of new code, while the share of code that got refactored (rather than newly written or copied) collapsed from about 25% down to under 10%. The AI's habit of regenerating similar shapes rather than reusing existing ones is measurable in the commit data.

GitClear also found that AI-authored pull requests carried roughly 1.7x more issues per PR (about 10.8 versus 6.4) than non-AI PRs in the same codebases, and that estimated technical debt increased somewhere in the 30-41% range post-adoption on the projects they analyzed.

The full report ("AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones") is publicly available if you want the methodology and the raw numbers.

These aren't "AI is bad" numbers. They're "AI amplifies whatever pattern the person driving already has" numbers. A senior engineer using AI produces measurably different output than a mid-level engineer using the same AI, and the compounding difference over months and years is what the U-Curve theory tries to capture. If AI-heavy projects see 39% higher churn on average, the projects driven by seniors are pulling that average down, and the projects driven by mid-level engineers without senior oversight are pushing it up.

For contrast, Stack Overflow's 2025 developer survey found 51.6% of respondents reported positive productivity impact from AI tools. That's a self-report number, though. The gap between "developers feel more productive" and "the codebase shows more churn and duplication" is exactly the tension the U-Curve tries to explain. Both can be true simultaneously if the productivity gain is uneven, concentrated at the seniors who are producing better code faster, offset by mid-level engineers producing more code that needs more revision.

That churn shows up in codebases too, not just in study aggregates. When a mid-level engineer uses AI heavily and doesn't have the senior instinct to catch the wrong shape early, their commit history looks busier than a senior's using the same AI. More small commits fixing small problems. More reverts. More "actually, do it this way" corrections. Each individual correction is small; collectively they represent time the engineer spent because the first pass wasn't quite right.

Common flaws AI has when coding

The mechanism of the mid-level failure mode is worth being concrete about. When I watch what AI defaults to, the patterns show up over and over:

Early assumption of correctness. The AI states its solution with confidence. Even when it's wrong. Even when the code it produces has an obvious bug. This confidence is contagious. An engineer without strong pattern-matching absorbs it and stops questioning the solution before verifying it.

Curve-fitted tests. Ask AI to write tests for the code it just wrote, and it often produces tests that match the specific behavior of that implementation rather than tests that describe the intended behavior. The tests pass. They also don't catch the case where the implementation is wrong, because the tests were written to match the implementation, not the specification.

Bloating third-party libraries. AI reaches for well-known libraries by default. Sometimes that's right. Sometimes it introduces a large dependency to do something the standard library could handle in ten lines. Left unchecked, the dependency count grows every session.

Poor readability by senior standards. The AI's default code is often mid-level-quality: it works, it's correct, it's neither offensively verbose nor cleverly compressed. Which means it lacks the specific stylistic sharpness that experienced engineers reach for. Function names that are slightly too generic. Variable names that mix abstraction levels. Comments that explain what rather than why.

Duplication. The AI generates similar shapes in different places without noticing that they're similar. Two rules that follow the same three-step structure get written as two independent methods with three steps each, instead of as a base class with the shared structure and two variants.

Each of these is small. Individually, each is fixable in a code review. Collectively, they add up to a codebase that requires more maintenance than the same codebase without them.

Keeping them in check is a chore. It requires the reviewer or the author to actively look for each pattern and push back. And this is where the senior/mid-level split matters again: the senior's calibrated intuition catches these more reliably than the mid-level engineer's. Not because the mid-level engineer is careless. Because the senior has seen the accumulated cost of each of these smells in past codebases and pattern-matches against them fast, whereas the mid-level engineer is still learning what the accumulated cost looks like.

Why the shape is a U

The reason juniors and seniors both gain is that both have a specific mechanism the AI amplifies well.

For juniors: knowledge gaps. AI fills them.

For seniors: articulation cost on intuition. AI compresses it. And the senior's judgment on what to accept and what to fix is what keeps the AI's amplified output from becoming amplified debt.

The mid-level tier is between mechanisms. They have enough knowledge that knowledge-filling matters less. They don't yet have the deep intuition that articulation-compression amplifies. Neither mechanism dominates, so the value they extract sits in the middle: real, but modest, and potentially negative in the specific case where the mid-level engineer amplifies subtle wrongness faster than they'd have produced it solo.

What this implies for organizations

Take this section with the same "operating theory" hedge as the rest. I'm reasoning from my own experience and from the patterns I've watched at client organizations. The prescriptions below are what I'd bet on, not what I've proven.

Don't hire mid-level engineers as your default. The classic hiring pipeline optimizes for mid-level because they're "safe." That was correct advice when the value each tier extracted from tooling was roughly proportional. It's less correct now. If AI amplifies juniors and seniors more than mid-level, then a team weighted heavily toward mid-level is systematically underinvesting in the tiers where AI is doing the most work.

Invest in senior mentorship for juniors. Juniors with AI get transformed. Juniors with AI and a senior watching their work get a career on rails. The risk that AI produces plausible-but-wrong code lands hardest on juniors, and the intervention that reduces the risk is a real human catching it. This is a place where mentorship pays off dramatically, and where losing it costs a lot.

Take seniors' AI concerns seriously, but push back on refusal to use it. The pattern I've seen is that seniors who won't use AI often frame it as principle ("I want to think for myself") when the real objection is discomfort with delegating any part of the work. That's fixable, but only if you name it. A senior who never uses AI in 2026 is systematically slower than one who does, and the gap grows every quarter.

Watch for mid-level plateaus. If your mid-level engineers are using AI heavily but not showing the growth trajectory you'd expect, ask whether they're being asked to do the kinds of problems that build senior intuition. Those problems are frustrating, slow, and easy to short-circuit with AI. They're also the ones that build the skill. Consider deliberately assigning some problems to mid-level engineers with an explicit "solve this without AI first, then compare your solution to what AI would give you" instruction. The comparison teaches them what to notice, which is what they need to develop the senior instinct.

Track code churn as a signal. If your codebase's churn rate is climbing as AI adoption climbs, that's a signal that the AI amplification is happening in the wrong direction, more code being written faster, but more of it being revised. If churn is stable or dropping while shipped features climb, that's the healthy pattern. This is a leading indicator you can watch monthly.

What I'm still unsure about

The domain generalization. This claim comes out of my Ruby/Rails and multi-language client work. I'm not confident it holds identically for greenfield mobile development, or for infrastructure work, or for research-adjacent domains. The shape might be different when the "intuition" the senior has is different in kind.

The mid-level definition. "Mid-level" is a fuzzy category. Some mid-level engineers have deep specialization that gives them senior-tier intuition in one subfield. They probably get senior-tier amplification in that subfield. The tier boundaries are messier than the piece implies.

The long-term cognitive question. Does heavy AI use weaken a developer's independent thinking over time? I don't know. I don't have enough evidence either way. I'm suspicious of both the "AI will atrophy our brains" and the "AI is pure upside" answers. If I had to guess, I'd say it depends heavily on how it's used, and the same tool can build or erode skill depending on the discipline the user brings. But that's a guess.

The whole shape. I keep saying "operating theory" because I mean it. This is a hypothesis to test, drawn from observation, not from data. Somebody with actual metrics, churn rates by developer, PR-review-comment counts, feature-completion times, could probably test whether the U-curve holds in their organization. I'd love to see that data. I don't have it.

If your organization is figuring out how to think about AI investment across your team, I'd rather you sit with the U-Curve as an operating hypothesis to test than as a claim to plan around. But I'd bet on the shape.

If you're navigating AI adoption at your company and want a second opinion from someone who's been using it heavily in production for the last two years, that's the kind of conversation Rock Agile likes to have. Get in touch.

John Epperson 7/3/26 John Epperson 7/3/26

The Refactor That Wasn't Broken

The most important refactors are the ones where nothing's broken yet. A walk-through of a real Ruby refactor that started with dissatisfaction, not a bug.

The most important refactors are the ones where nothing's broken yet.

I'd just landed a feature. A cross-app "What's Next" widget that surfaces things the user should do next. Incomplete onboarding. Missing documents. Ungenerated content sections. The first draft worked. Tests passed. Lint was clean. All the mount points rendered. Then I read the code again and didn't like it.

The trigger was internal dissatisfaction. Not a bug. Not a performance problem. Just a strong sense that the shape was wrong. That's a legitimate signal, and refusing to act on it because "nothing's broken" is how codebases quietly rot.

Refactors that come from dissatisfaction are the ones I trust most. Bug-driven refactors usually have a narrower scope: you're fixing the specific thing that broke. Refactors that come from a dissatisfied read of code that works are usually restructuring something that would have caused pain later, and doing it before the pain arrives. That's cheaper.

Here's how I walked the refactor from that vague sense of wrong to a shape I could defend.

Stage 0: The starting code

A 200-line service with eight item-builder methods. Two of them looked roughly like this:

def content_items
  nudges = []
  nudges << content_ungenerated_item if ungenerated_sections.any?
  nudges << content_over_cap_item    if over_cap_count.positive?
  nudges.compact
end

def content_ungenerated_item
  missing = ungenerated_sections
  labels  = missing.map { |s| I18n.t("app.content.sections.#{s}.label") }
  item(kind: :nudge,
       step: :content,
       label_key: "user.content_ungenerated",
       label_params: { count: missing.size, sections: labels.to_sentence },
       cta_url: @url_helpers.content_path,
       cta_method: :get)
end

And a couple methods later:

def document_items
  nudges = []
  nudges << document_missing_item  if missing_document_types.any?
  nudges << document_expired_item  if expired_document_count.positive?
  nudges.compact
end

def document_missing_item
  missing = missing_document_types
  labels  = missing.map { |t| I18n.t("app.documents.types.#{t}.label") }
  item(kind: :nudge,
       step: :documents,
       label_key: "user.documents_missing",
       label_params: { count: missing.size, types: labels.to_sentence },
       cta_url: @url_helpers.documents_path,
       cta_method: :get)
end

Look at those two together. They're not the same method, the domain concept differs, the I18n namespace differs, the CTA URL differs. But the shape is identical. The list-and-labels processing is the same. The item-construction is the same. The nudges << x if cond guard is the same. And the smell of copy-paste is unmistakable.

Extend that to the eight rules that were in the file, and you had eight variants of the same shape doing eight different domain jobs. That's fine when there are two of them. It's a code smell when there are eight.

And a compute method that imperatively concatenated each builder's output:

def compute
  items = []
  items.concat(onboarding_items)
  items.concat(profile_items)
  items.concat(document_items)
  items.concat(content_items)
  items.concat(export_items)
  items.sort_by { |item| WhatsNext::PRIORITY.fetch(item.kind) }
end

It worked. It passed tests. It rendered correctly on every page. But four smells:

Imperative concat. items = []; items.concat(...) is "build a thing through mutation," which fights the declarative style the rest of the codebase uses. Most of this app leans on map, flat_map, and array literals with conditionals. This compute method is an island of mutation in a sea of declaration.
Builder methods accepting parameters that were all derivable from self. Every keyword argument passed to item(...) was instance state. That's the data-clump smell in one of its most recognizable forms: passing the object's own state back to itself as arguments.
Duplicated label processing. Two of the eight callers mapped a list through I18n.t and .to_sentence. Same shape, different namespace. This kind of near-duplication accumulates in a service class over time. Each new rule copies the pattern of the previous one, and by rule five you have four almost-identical helpers.
A class doing eight things in one file. Each "rule" was a private method related to the others only by living in the same Ruby file. As long as this stayed at eight rules, the file was manageable. The moment it grew to twelve or fifteen (and it would), the file would be unreadable.

None of these were bugs. Every one of them was a design smell that would compound over time.

Stage 1: Lay out the options before touching code

The temptation when you spot a smell is to fix it inline. I resisted. I put three options on the table with code sketches for each.

Option A: Surface fix only. Convert the imperative shapes to declarative ones without changing the class structure. The compute method becomes:

def compute
  [
    *onboarding_items, *profile_items, *document_items,
    *content_items, *export_items
  ].sort_by { |item| WhatsNext::PRIORITY.fetch(item.kind) }
end

And each builder method flattens into the declarative form:

def content_items
  [
    (content_ungenerated_item if ungenerated_sections.any?),
    (content_over_cap_item    if over_cap_count.positive?)
  ].compact
end

Minimum churn. The imperative smell is gone. The other three smells remain. Deploy risk: near zero. But it doesn't address the deeper problem: this class is still doing eight things.

Option B: Rule classes with a light factory. Each builder becomes its own class with a uniform #items interface. The main service collapses to a registry:

class WhatsNext
  RULES = [
    Rules::Onboarding, Rules::Profile, Rules::Documents,
    Rules::Content, Rules::Export
    # ... and so on
  ]

  def compute
    RULES.flat_map { |klass| klass.new(@user, @url_helpers).items }
         .sort_by { |item| PRIORITY.fetch(item.kind) }
  end
end

Each rule class is tiny. Something like:

class Rules::Documents
  def initialize(user, url_helpers)
    @user, @url_helpers = user, url_helpers
  end

  def items
    [
      (missing_item if missing_types.any?),
      (expired_item if expired_count.positive?)
    ].compact
  end

  private
  # ... the domain-specific helpers ...
end

Eight tiny files, each independently testable. Pattern matches an existing precedent in this codebase (another service was already using the flat-map-a-list-of-classes pattern). Adding a ninth rule is one new file with one new line in the RULES list.

Option C: Real domain models plus presenters. Pull the "what state is this domain in" question into actual models:

class ProfileState
  def initialize(user)
    @user = user
  end

  def missing_fields
    [
      (:primary_specialty if @user.primary_specialty.blank?),
      (:bio if @user.bio.blank?),
      (:location if @user.location.blank?)
    ].compact
  end

  def complete?
    missing_fields.empty?
  end
end

Then the WhatsNext service becomes a thin presenter over the domain layer:

class WhatsNext
  def compute
    assessments = [
      ProfileState.new(@user),
      DocumentCoverage.new(@user),
      # ... and so on
    ]
    assessments.flat_map { |a| items_from(a) }
               .sort_by { |item| PRIORITY.fetch(item.kind) }
  end
end

Just describing three alternatives surfaced tradeoffs I'd have missed if I'd started refactoring right away. Option A had almost no upside beyond removing the imperative concat. Option C had significant upside, but only if we had enough evidence to justify the domain layer today. Option B was the sensible middle. Even knowing B was right, sketching A and C made me confident in it.

The principle: before refactoring, force yourself to write down 2-3 alternatives. Just describing them surfaces tradeoffs you'd otherwise miss.

Stage 2: Pick by evidence, not by aesthetic

I leaned toward B. But a sharper question came up: is C worth the cost? "Do the domain models read nicely" isn't the right frame. The right frame is: is there existing duplication or imminent reuse to justify pulling out the extra layer?

I went searching for real duplication of the ProfileState-style query across the codebase. The actual grep commands I ran looked something like:

# Find every place we're checking "is this profile field filled in?"
rg 'primary_specialty.presence' --type=ruby
rg 'primary_specialty.present?' --type=ruby
rg '@user\.(bio|location|specialty)' --type=ruby -A 2

# For the document coverage query:
rg 'user\.documents\.where\(type:' --type=ruby -B 1 -A 3

# For the content readiness query:
rg 'ungenerated_sections' --type=ruby

The output shape for the ProfileState query looked like this:

app/services/generate/resume_base.rb:23:  fallback = user.primary_specialty.presence || "(not declared)"
app/services/generate/opportunity_resume.rb:47:  fallback = user.primary_specialty.presence || "(not declared)"
app/services/generate/linkedin_section.rb:31:  fallback = user.primary_specialty.presence || "(not declared)"
app/services/score_specialty.rb:12:  return :low if user.primary_specialty.blank?
app/services/whats_next.rb:88:  missing << :primary_specialty if user.primary_specialty.blank?

Five callers. Three of them in generation services. One in a scoring module. One in the new What's Next rule.

Concrete evidence per candidate:

ProfileState. Five places already. Plus imminent reuse in three upcoming features. Verdict: extract today. High-confidence reuse and immediate duplication.

DocumentCoverage. Only one caller today. But the roadmap included a "document tailoring" feature that would need this exact query as a direct lookup rather than as an AI-inferred flag. Verdict: defer until that feature forces the second caller. Extracting it now would be architectural speculation, the correct instinct is to wait for evidence.

ContentReadiness and ExportRoster. One caller each, no clear second. Verdict: keep inline.

Along the way I made an honest correction. I'd initially claimed the four generation services were duplicating the ProfileState query. Looking again, they were doing presence || "(not declared)" as a prompt-interpolation idiom that was adjacent to but different from the completeness question. The state question is "is this field filled in?"; the interpolation idiom is "how do I display this field in prompt text?"

Look at the specific pattern used in the generation services:

# In app/services/generate/resume_base.rb:
specialty = @user.primary_specialty.presence || "(not declared)"
prompt = "You are writing a resume for a #{specialty}..."

versus what ProfileState would want to do:

# In ProfileState:
def missing_fields
  fields = []
  fields << :primary_specialty if @user.primary_specialty.blank?
  # ...
end

Both touch @user.primary_specialty. Both involve a blank/present check. But they're doing different jobs. The generation service is producing a display string. ProfileState is producing a domain classification. Conflating them would make ProfileState a god-object serving two concerns.

So: extract ProfileState for the state question. Leave the generation services alone. The four sites where presence || "(not declared)" appears will keep doing that inline, because it's a display concern in that context, not a domain concern.

The principle: "extract when a second caller appears" is a useful brake on domain modeling. It prevents speculative architecture. Domain models without callers are debt. Wait for evidence. The exception is when you can find duplication already in the code AND know the next caller is imminent. That's high-confidence reuse.

The corollary is that being honest about what "duplication" means matters. Two lines of code that look similar aren't necessarily the same concern. If you extract them together, you'll end up with a helper class that has to fork behavior based on which caller is calling it, and now you've made two separate concerns into one entangled one.

Stage 3: Naming matters more than the structure

With Option B and the ProfileState extraction agreed on, the next push came from an unexpected place: the name.

WhatsNext was a UI label leaking into the domain layer. The service's actual job was measuring the state of the workspace across multiple dimensions. Three honest framings on the table:

Framing	Fit	Notes
Readiness	All 8 rules	"Is the workspace ready in this dimension?" Covers gaps, queues, and broken artifacts uniformly.
Completeness	5 of 8	Strong for gaps, awkward for review queues and defensive items.
Audit	All 8	Honest about what the service does. Compliance connotations.

Picked Readiness. It unified the meaning across all eight rules, the namespace nested cleanly, and the user-facing "What's Next" widget could stay as a thin presenter over the domain output. In code, that layering wires up like this:

# Domain layer, the reusable, business-facing thing:
module Reviewer
  RULES = [
    Reviewer::Onboarding,
    Reviewer::Profile,
    Reviewer::Documents,
    Reviewer::Content,
    Reviewer::Export
    # ...
  ]

  def self.compute(user, url_helpers)
    RULES.flat_map { |klass| klass.new(user, url_helpers).items }
         .sort_by { |item| WhatsNext::PRIORITY.fetch(item.kind) }
  end
end

# Presentation layer, the specific widget that surfaces this:
class WhatsNext
  def self.compute(user, url_helpers)
    Reviewer.compute(user, url_helpers)
  end
end

For now the presenter is a one-line delegate. But if we ever need a second presenter, an email digest of pending items, a CLI status command, a health-check endpoint that gates the workspace as "not ready", the domain layer is the reusable thing. The UI layer is what changes per surface.

The principle: when the existing name is a widget label, push back. The domain layer should be named for what it does, not how it's rendered.

Naming discussions feel like bikeshedding when you're in them. They're not. The name you pick determines what future engineers think the code is for. Choosing "Readiness" over "WhatsNext" is choosing to communicate that this code will be reused across contexts. Choosing "WhatsNext" would have communicated that it's a UI widget and would have implicitly discouraged the future presenter.

Stage 4: First pass, then a second look

First implementation: eight rule classes, each with their own #items method:

class Reviewer::Onboarding < Reviewer::Base
  def items
    return [] if @user.onboarded?
    [item(kind: :blocking, step: :onboarding,
          label_key: "user.onboarding_incomplete",
          cta_url: @url_helpers.onboarding_path, cta_method: :get)]
  end
end

Tests passed. Lint clean. Pushed.

Then I re-read the code and found more smells the first pass had left behind. Every #items was rebuilding the WhatsNext::Item structure inline. Every keyword argument to item(...) was derivable from self. Two rules were duplicating the same list-to-sentence label processing.

This was the Template Method pattern asking to be born. The Base class should own the orchestration. Subclasses should describe their properties via hook methods:

class Reviewer::Base
  LABEL_NAMESPACE = "app.workflow.whats_next".freeze

  def initialize(user, url_helpers)
    @user = user
    @url_helpers = url_helpers
  end

  def items
    applies? ? [item] : []
  end

  def item
    WhatsNext::Item.new(
      kind: kind, step: step, label: label,
      cta_url: cta_url, cta_method: cta_method
    )
  end

  private

  attr_reader :user, :url_helpers

  # Hooks, subclass overrides
  def applies?     = raise NotImplementedError
  def kind         = raise NotImplementedError
  def step         = raise NotImplementedError
  def label_key    = raise NotImplementedError
  def cta_url      = raise NotImplementedError
  def label_params = {}
  def cta_method   = :get

  # Derived helpers
  def label
    I18n.t("#{LABEL_NAMESPACE}.#{kind}.#{label_key}", **label_params)
  end

  def localized_list(keys, namespace:)
    keys.map { |k| I18n.t("#{namespace}.#{k}") }.to_sentence
  end
end

And the subclasses collapse to property declarations:

class Reviewer::Onboarding < Reviewer::Base
  private

  def applies?  = !user.onboarded?
  def kind      = :blocking
  def step      = :onboarding
  def label_key = "user.onboarding_incomplete"
  def cta_url   = url_helpers.onboarding_path
end

Five hook overrides. No construction noise. No parameter passing. No return [] unless ... boilerplate. Each method does one thing. The grain is right.

For rules with computed label parameters, the same pattern holds:

class Reviewer::Profile < Reviewer::Base
  private

  def applies?     = user.onboarded? && !state.complete?
  def kind         = :nudge
  def step         = :profile
  def label_key    = "user.profile_incomplete"
  def cta_url      = url_helpers.profile_path

  def label_params
    { fields: localized_list(state.missing_fields,
                             namespace: "#{LABEL_NAMESPACE}.profile_fields") }
  end

  def state
    @state ||= ::ProfileState.new(user)
  end
end

And for a rule that has multiple ways to trigger:

class Reviewer::Documents < Reviewer::Base
  private

  def applies?  = missing_types.any? || expired_count.positive?
  def kind      = :nudge
  def step      = :documents
  def label_key = missing_types.any? ? "user.documents_missing" : "user.documents_expired"
  def cta_url   = url_helpers.documents_path

  def label_params
    return { count: missing_types.size } if missing_types.any?
    { count: expired_count }
  end

  def missing_types
    @missing_types ||= DocumentCoverage.new(user).missing_types
  end

  def expired_count
    @expired_count ||= user.documents.expired.count
  end
end

The label_key and label_params are conditional because the same rule handles two related states. That's fine, the rule is small enough to hold both conditions without the class becoming confused. If the branching got more complex, that would be a signal to split into two rules.

Each method does one thing. The grain is right. And adding a ninth rule is one file with five hook overrides.

Ruby techniques worth calling out

A few Ruby-specific details make this pattern read especially well:

Endless method definitions (def applies? = !user.onboarded?). Ruby 3.0+. They make hook overrides read as data rather than as code. Compare:

# Regular form, six lines per property
def applies?
  !user.onboarded?
end

def kind
  :blocking
end

To:

# Endless form, two lines
def applies?  = !user.onboarded?
def kind      = :blocking

Same behavior. The endless form makes the "declaration" nature of the code visible. When every subclass's overrides are single-expression declarations, the endless form removes the def...end noise and lets the properties themselves become the visual content of the file. If you're glancing at a rule class, you should be able to see its full property definition in a screen. Regular def...end blocks make that hard for a class with five hooks; endless methods make it easy.

attr_reader :user, :url_helpers in the base. Small but meaningful. Compare:

# Without attr_reader, subclasses use instance variables
def applies?  = !@user.onboarded?
def cta_url   = @url_helpers.onboarding_path

To:

# With attr_reader on the base, subclasses use methods
def applies?  = !user.onboarded?
def cta_url   = url_helpers.onboarding_path

The second reads as a property declaration ("I need user to check whether they're onboarded"). The first reads as instance-variable access ("I need @user, but @user where? Set where? Any subclass can accidentally reassign this."). The attr_reader also enforces read-only access to the constructor arguments, which prevents a whole class of bugs where a subclass would clobber @user mid-execution.

The declarative array-with-conditionals form. Compare:

# Imperative, build through mutation
def items
  arr = []
  arr << missing_item if missing_types.any?
  arr << expired_item if expired_count.positive?
  arr
end

To:

# Declarative, describe what the array can contain
def items
  [
    (missing_item if missing_types.any?),
    (expired_item if expired_count.positive?)
  ].compact
end

Same output, different reading order. The first describes an algorithm ("start empty, conditionally push, return"). The second describes a specification ("here are the two items this rule can produce"). In a service class whose job is to describe eligibility, the second form reads as the answer to the question. The first reads as the process by which the answer gets computed. When you're describing rules, the specification form is almost always right.

flat_map and sort_by as a one-liner pipeline vs. imperative concat-and-sort. Compare:

# Imperative
def compute
  items = []
  RULES.each do |klass|
    items.concat(klass.new(user, url_helpers).items)
  end
  items.sort_by { |i| PRIORITY.fetch(i.kind) }
end

To:

# Declarative pipeline
def compute
  RULES.flat_map { |klass| klass.new(user, url_helpers).items }
       .sort_by { |i| PRIORITY.fetch(i.kind) }
end

The chain is two operations that compose. The imperative version is five lines that don't compose. In Ruby, whenever you can express a transformation as a pipeline, it reads better than as a loop with mutation.

Testing the pattern

Each rule class is independently testable, which is one of the biggest reasons for extracting them. Before the refactor, testing the WhatsNext service required constructing a User in various complete/incomplete states and asserting on the aggregate output. After the refactor, testing a rule is a two-line stub:

RSpec.describe Reviewer::Onboarding do
  let(:url_helpers) { double(onboarding_path: "/onboarding") }

  it "returns a blocking item when user is not onboarded" do
    user = build_stubbed(:user, onboarded: false)
    rule = described_class.new(user, url_helpers)

    expect(rule.items.size).to eq(1)
    expect(rule.items.first.kind).to eq(:blocking)
    expect(rule.items.first.step).to eq(:onboarding)
  end

  it "returns no items when user is already onboarded" do
    user = build_stubbed(:user, onboarded: true)
    rule = described_class.new(user, url_helpers)

    expect(rule.items).to be_empty
  end
end

The rule's tests describe the rule's contract, "here's when it fires, here's what it produces." No user state gets tangled up with other rules. The WhatsNext service, meanwhile, gets a much shorter test suite that just checks the composition, "when three rules fire, we get three items in the right order." Each layer's tests describe that layer's job.

Before the refactor, the service class had eighteen tests, all of which had to construct a User in specific combined states to exercise the various branches. After the refactor, the same behavioral coverage lives in twenty-five tests split across the rule classes and the WhatsNext composition. Each test is smaller. Each is easier to write. And when a rule changes, only that rule's tests need updating.

The dependency tree that made this possible

One thing I glossed over: this refactor was cheap because we already had a complete dependency map for the What's Next pattern. Every rule, every downstream consumer, every follow-up item that would need to change if a rule's shape changed, all mapped out and documented. Before I typed the first character of the refactor, I knew exactly what depended on what.

That's not typical. In most codebases, identifying the dependency tree for a feature like this is something you do late in the game, usually after a refactor has started to go sideways, and you're scrambling to figure out what else you're breaking. We do it upstream. The dependency map is a first-class artifact, not something that lives in developers' heads and gets discovered as needed.

The immediate value of a complete dependency map is that it makes evidence-based refactoring easy. When I said "grep the codebase for this pattern," the grep was the confirmation, not the discovery. The dependency map already told me ProfileState had five callers in specific files. The grep just verified the exact line numbers. That inverts the usual investigation loop: instead of guessing what might depend on something and then searching to verify, you already know, and searching is bookkeeping.

The deeper value is that a complete dependency map makes it easy to identify follow-up items, the places where a related architecture should look the same but doesn't yet. Once we agreed to extract ProfileState as a domain model, the dependency map told us exactly which other reviewer rules would benefit from the same treatment when their evidence arrived. Not "someday when we notice." Right now, in the docs, with the specific criterion (a second caller) that would trigger each extraction. When DocumentCoverage becomes justified by the second caller landing, the extraction won't be a fresh design exercise. It will be running the pattern we already documented.

This is a discipline most codebases skip. Their dependency maps live in the developers' heads, discovered as needed. The result: refactors are expensive because you don't know what you're going to break, so you refactor conservatively, small changes, minimum churn, leave the rest of the codebase alone. Which means the rest of the codebase stays in the pre-refactor shape indefinitely. It never gets the same treatment. The codebase drifts into inconsistency because each part gets refactored on a different schedule based on when it starts causing pain.

Having the dependency map up front inverts this. We can refactor systematically because we know the whole picture. When we improve the shape of one rule, we can immediately improve the shape of related rules that share the same architecture pattern. The refactor doesn't leak into surprise breakage because there aren't surprises. And when we sweep the codebase months later to check that all rules follow the same pattern, we're checking against a documented spec, not against a shifting collective memory.

For teams that haven't done this: it's worth doing. The upfront cost of mapping the dependency tree is real, but it pays off the first time you need to refactor anything non-trivial in that part of the codebase. And it pays off continuously, every subsequent change is done with the full picture in view rather than with the local view of just the code you're touching.

Principles distilled

Six principles I'd carry from this refactor to the next one.

The most important refactors are the ones where nothing's broken. Dissatisfaction is a legitimate input. The smell instinct is faster than the bug instinct, and codebases that only get refactored when something breaks accumulate quiet debt.

Write down 2-3 alternatives before changing code. Even when you know option B is right, sketching A and C makes you confident in B. Sometimes the sketching surfaces a better hybrid you hadn't seen.

"Extract when a second caller appears" is a useful brake on domain modeling. Domain models without callers are debt. Wait for evidence. The exception is when you can find duplication already in the code and know the next caller is imminent. That's high-confidence reuse.

Push back on names that leak presentation into the domain layer. WhatsNext was the widget label. The service was doing Readiness assessment. The rename made the layering obvious.

Be honest mid-refactor about your earlier claims. I claimed five callers were duplicating the ProfileState query. On re-read, four were doing an adjacent-but-different pattern. Saying that out loud is the difference between honest design and motivated reasoning.

Hook methods over parameters when the data is on self. Parameter lists in Ruby OO are often a sign the data should be moved onto the receiver. Subclasses that declare def kind = :blocking read like property declarations. That's the grain the language wants.

The final shape

Eight rule classes, each 12-29 lines. A Reviewer::Base at 72 lines doing all the orchestration. A Reviewer registry at ~20 lines. A ProfileState domain model at ~30 lines. Twenty-five test examples covering the rules and the new model.

Adding the ninth rule is one file with five hook overrides.

None of that was necessary. Nothing was broken. The dissatisfaction was the whole trigger. That's the point.

If you're navigating a Rails codebase where the design has quietly drifted and you're not sure where to start, that's exactly the kind of work Rock Agile does. Get in touch.

John Epperson 7/3/26 John Epperson 7/3/26

Inheriting a Dev Project: How to Not Blow It in the First Two Weeks

The shape of the work when you inherit somebody else's codebase, how to earn the right to change it before you change anything.

At Rock Agile, we do this a lot. Someone hands us a codebase they didn't write. The original developer is gone. The documentation is stale. The tests may or may not run. The business owners want progress soon, ideally last week. This is one of the specific things we're hired for, and after years of doing it, I've learned there's a shape to the work that keeps you out of trouble.

The shape has one non-negotiable at the front: you have to check yourself before you touch anything.

Start with your mindset, not the code

Developers are naturally judgmental. We look at somebody else's code and our first instinct is to see what's wrong with it. Sometimes the code deserves the judgment. More often, we're just missing context the original developer had, and our judgment is going to blind us to what the code is actually doing.

There's a Henry Ford story I keep coming back to. Ford would take job candidates to dinner. If they seasoned their food before tasting it, he wouldn't hire them. He wasn't afraid of change, he changed things constantly, but he wanted leaders who would understand what was on the plate before they started rearranging it.

The principle transfers to software. Understand what the code is doing, and why, before you decide it's wrong. Prejudgment on an inherited codebase is the single most common mistake I see, and it usually looks like premature refactoring or optimization on parts of the code we haven't earned the right to change yet.

Curiosity works better than judgment. Come in expecting to be surprised by what you find.

Understand what the software is supposed to do

Before you dig into how it works, understand what it's for.

The best case is a product owner who can walk you through the business logic. If you have them, use them. Ask why the software exists, what problem it was originally built to solve, and how that problem has evolved. Understand the constraints that shaped the original architecture, even the ones that no longer apply.

If the product owner isn't available, the next best source is any developer who worked on it before. Ask what was tried, what worked, what didn't, and what problems keep coming back. Repeated bugs and repeated attempts to fix the same thing usually point to something structural that the previous team knew about but couldn't get to.

If nobody is available, the code has to teach you. That's slower, but not impossible.

Get a working environment before anything else

Before you form opinions about the code, get it running. Bootstrap it. Run the tests. Click through the features. Take notes on what works and what doesn't. Test in multiple environments if you can. Differences between platforms are often where the most interesting bugs live.

The specific move I make early on a Rails inherited project: find the biggest files. That's usually where the core domain lives, or where things went wrong. Read the dependency graph next. The Gemfile tells you what problems the original team decided to outsource, and outsourcing decisions age faster than internal code.

Set a timer on rabbit holes. Early in my career I gave myself five minutes on any tangent while getting a new project running. That habit still runs in the background now. I'm faster at recognizing which threads are worth pulling and which aren't. It's a discipline worth building.

Work with the existing decisions before you change them

Now you know what the software is for and how it runs. Only now should you start considering changes.

Software development is trial and error, and it's a lot more of an art than most engineers admit. My rule for inherited projects is to prioritize working with the previous team's choices before replacing them. Sometimes the choices are wrong (plenty of them will be) but you'll spot the actually-wrong ones faster if you're not fighting the whole codebase at once. And sometimes what looked wrong at first glance turns out to have been the right call for constraints you didn't know about.

When you do start making changes, keep them small and reversible. The goal in the first weeks is to build enough understanding of the codebase that you can commit to bigger changes with confidence.

Take notes obsessively

I cannot overstate this. Write down what you tried, what worked, what didn't, and what confused you. This is partly for your future self: a month from now, when something breaks, you'll want to know what you were thinking today. But it's also for whoever comes after you. Somebody will inherit this project from you eventually. The notes you take now become the documentation they'll wish existed.

Share your findings

Software isn't a solo sport. Share what you've learned with the client, with your team, with anyone else who touches the system. Other people will see things you missed. That's what makes the process work.

The through-line

The pattern is: discovery before decisions. Every step above is really about earning the right to change the code. The developers who blow up inherited projects skip the discovery. They come in confident, refactor too much too soon, and then can't tell whether their changes broke something existing or exposed a bug that was always there. The developers who inherit projects well are the ones who move slower at the start and faster later, because by the time they're touching things aggressively, they know what they're touching.

If you're looking at a legacy codebase you inherited, or one your team inherited, that's the kind of work we do. We come alongside your team, do the discovery work, and either help you build on what's there or rescue what needs rescuing. If that sounds relevant, get in touch.