> ./contents Contents 11
> ./posts/from-mvc-to-vertical-slice

Why layered codebases punish humans and AI coding agents (and what vertical slice fixes)

Why layered MVC codebases hurt both your devs and your AI coding agents, and how one folder per feature (vertical slice) fixes it, with the SPL migration story.

id 0x05cluster Engineering Practiceread ~ 14 min
architecturevertical-slicecoding-agentsdotnetcontext-engineering

I spent a long time writing layered MVC. Controllers in one folder, services in another, repositories under that, view models off to the side, validators in a fourth place. It was tidy on a slide. It was painful in practice. Adding one feature meant editing five files in three folders, and the only thing holding the feature together was the developer’s memory.

At SPL, our team moved a customer-facing .NET service from that shape to FastEndpoints and vertical slices. Each endpoint became one folder. Request, validator, handler, response, all sitting next to each other. Onboarding got faster. Code reviews got smaller. The number of accidental cross-feature changes dropped sharply. I had read Jimmy Bogard’s 2018 essay earlier and thought it was a niche taste. After a couple of months on the migrated service I stopped thinking of it as a taste and started defaulting to it on every new project.

Two years on, the same property has paid out a second time. The codebases I now write with Claude Code and Codex read better when they’re sliced by feature than when they’re sliced by layer. The agents need less context to make a useful change, and they make fewer confident mistakes. The reason is the same reason humans got faster: the unit of work and the unit of code shape are the same.

The starting point: layered MVC, the default that nobody chose on purpose#

The service I inherited at SPL was a routine .NET MVC project. The folders matched the framework defaults. Controllers/. Services/. Repositories/. Models/. DTOs/. A Validators/ folder once we added FluentValidation. A Mappers/ folder once we added AutoMapper. A Helpers/ folder, because every long-lived MVC codebase has a Helpers/ folder.

A new endpoint touched all of them. The controller signature, the request DTO, the service method, the repository method, the entity, the validator, the mapper. Seven files across seven folders for an action that, in business terms, was one thing: create a service request for a customer. The seven files were not bound together by anything stronger than “the developer who wrote this remembered they all needed to change.”

The cost was not visible in the small. A senior engineer who already knew the codebase moved through those seven files fast. The cost showed up in onboarding. New contractors took a week to ship a small endpoint, not because the endpoint was hard, but because the search-and-stitch step across all the folders was hard. It also showed up in code review. A pull request for one feature looked like changes to seven unrelated parts of the system. Reviewers had to mentally re-assemble the slice from the layered diff every time.

None of this was a new problem. Parnas 1972 argued for decomposing systems around the design decisions most likely to change. Constantine called the well-decomposed shape functional cohesion. Robert C. Martin later called it screaming architecture. The names differ. The rule is the same: organise code around the thing that changes. Layered MVC has been the popular default anyway, because the framework points the other way.

The migration: FastEndpoints, one folder per feature#

The move was not a rewrite. It was incremental.

The new pattern was a folder per feature, modelled on Bogard’s essay and the REPR pattern (Request, Endpoint, Response) that FastEndpoints encourages. Each endpoint became its own class in its own folder, with the request type, the validator, the response type, and the handler logic sitting alongside it. The folder names matched the use case: CreateServiceRequest, AssignTechnician, CompleteJob, GetServiceHistory. The structure read like the business, not like the framework.

In the source tree, the shift looks like this. Production project on top, a parallel test project of the same shape underneath.

Before
    Controllers/CreateServiceRequestController.cs
    Services/CustomerService.cs
    Repositories/ServiceRequestRepository.cs
    DTOs/CreateServiceRequestDto.cs
    Validators/CreateServiceRequestValidator.cs
    Mappers/ServiceRequestMapper.cs

After
    src/MyService/
      Features/
        CreateServiceRequest/
          Endpoint.cs
          Request.cs
          Response.cs
          Validator.cs
          Handler.cs

    tests/MyService.Tests/
      Features/
        CreateServiceRequest/
          HandlerTests.cs
          EndpointTests.cs

One folder per feature in production, with a mirrored folder of the same name in a separate test project. The slice convention shows up twice: once in src/, once in tests/. A change to one feature is still one folder open in each project, not seven folders across the codebase.

The migration discipline was simple. Any new feature went into a vertical slice. Any feature already touched by a bug fix or a small change moved to a slice on the way out. The MVC controllers stayed alive next to the slices for as long as anything depended on them. At no point was there a flag day. After about three months, the slices outnumbered the controllers, and the team’s mental model had flipped.

Tim Deschryver’s argument that the minimal-API endpoint is the application layer was the conceptual move I had to make. The endpoint is not a thin shell that calls down into the “real” code in a service. The endpoint owns the feature’s flow. Validation, orchestration, the calls into the domain, the response shaping. All of it. The shared domain still exists, but it is small, deliberate, and reached into from a slice rather than wrapped around the slice.

The improvements showed up fast. Code reviews stopped feeling like archaeology, since a pull request was usually one slice folder plus its mirror in the test project. New engineers shipped a small endpoint in days instead of weeks: the next feature lived in a new folder next to an existing one, rather than spread across seven layers they had to learn first. Tests got easier because each slice had obvious inputs and outputs, and the parallel test project did not need to mock half the application to exercise one handler.

Cross-feature accidents dropped. Layered architectures push you toward shared services that quietly accumulate responsibilities. A CustomerService that started as a thin wrapper around a repository becomes the place where five features stash their logic, because the next feature is always one method away. Vertical slices make that move visible. If two slices need the same thing, you extract it deliberately, into a clearly named utility or a small domain object. The default is no sharing. The exception is named.

The trade-off was real and worth naming honestly. Some duplication appeared, mainly in request and response DTOs, and in small pieces of orchestration that two slices both needed. The discipline Milan Jovanović writes about in Structuring Vertical Slices is what prevents the duplication from turning into incoherent state: shared concepts get extracted as patterns become visible, not pre-emptively. We pulled out a handful of small shared types over the first six months. None of them were layers.

The principle: code shape should match the unit of change#

What I had quietly absorbed by the end of the migration was not “FastEndpoints is good”. It was the older cohesion principle, restated for the framework I happened to be working in.

The unit of change in a typical application is not “all controllers” or “all services”. It is a feature. The codebase should be organised so that the surface area of one change is small, local, and obvious from the folder structure.

The reason vertical slice is not “just renamed feature folders” is that the principle includes how shared code is treated. Vertical slice does not say “no abstractions”. It says no shared abstractions by default. When a pattern recurs across slices in a way that earns extraction, you extract it, name it, and put it somewhere reusable. The asymmetry is what protects the codebase from drifting back into a layered shape under the weight of shared utilities. Milan Jovanović covers this well in Vertical Slice Architecture Is Easier Than You Think.

This is not a .NET-only problem#

The same mistake shows up in other stacks. Different folder names, same smell.

A React app can be layered badly too. components/, hooks/, api/, types/, stores/, and utils/ looks tidy until one product change touches all six. The shape I prefer for product code is feature-first: features/create-service-request/ with the form, schema, mutation, API client, and state sitting together, and a parallel test folder for the same feature. Shared UI primitives still belong in a shared place. Feature-specific UI stays with the feature.

Python has the same failure mode. A FastAPI app with top-level routers/, schemas/, services/, and repositories/ is layered MVC under different names. For product code with many distinct use cases I prefer features/create_service_request/, with the router, request schema, handler, and response schema together and the tests in a parallel tests/features/create_service_request/ tree. Shared database session, auth, logging, and domain primitives stay shared.

Java got here years ago through its own door. The package by feature, not by layer argument has been circulating in Spring communities for a long time, and DZone’s Package by Layer is Obsolete makes the same point bluntly.

The rule survives the stack change.

Why this compounded when AI coding agents arrived#

This is the part I did not see coming.

When I started using Claude Code and Codex on side projects, the codebases that read the cleanest for me as a human read the cleanest for the agents too. The agents appeared to make smaller, more reliable changes on vertical slices than they did on layered ones.

It is not a benchmark. It is the kind of feel-the-difference observation that anyone who has paired with an agent on two different codebases of the same age will recognise. It also maps onto a structural property of how these tools work.

Coding agents do not load a whole codebase into a single context window. They retrieve. Cursor builds a semantic index of the codebase using embeddings cached in a Merkle tree so that only modified files get reprocessed, then pulls the relevant chunks into context per task. Sourcegraph’s Cody pipeline combines a BM25-style code search across selected repositories with local IDE context and a ranking step that picks the most relevant snippets for the prompt. Anthropic’s account of how Claude Code works in large codebases says the agent navigates a codebase the way a software engineer would: traversing the file system, reading files, running grep, and following references. None of these are exhaustive readers. They are guided ones.

In every one of these systems, the agent’s view of the world for a given change is a small subset of the codebase. The retrieval system has to guess which files are relevant. The agent then has to reason over what it gets.

Fig. 1: the retrieval surface of one feature change across two codebase shapes.

Layer this on the retrieval evidence from research. Liu et al.’s 2023 paper Lost in the Middle: How Language Models Use Long Contexts showed that models perform best when relevant information sits at the start or the end of context, and significantly worse when the relevant information sits in the middle, even for explicitly long-context models. Hsieh et al.’s 2024 RULER benchmark, published at COLM 2024, showed that models can ace simple needle-in-a-haystack retrieval and still fall apart on harder retrieval, multi-hop tracing, and aggregation tasks at the same length. Of the seventeen long-context models the authors tested, only about half could maintain satisfactory performance at 32K tokens, despite every one of them advertising context windows that size or larger. Anthropic’s own context engineering writeup frames the discipline as a step beyond prompt engineering, asking which configuration of context most reliably produces the behaviour you want rather than how many tokens you can fit.

Two facts compose.

First, agents always see a subset.

Second, the chance of mistakes rises as the relevant context gets larger and more scattered.

A vertical slice is a small, local subset by construction. A layered MVC change requires a subset that spans seven folders. The same agent, asked to change “the create service request endpoint”, retrieves one folder on a sliced codebase and seven on a layered one. On the layered codebase, even if every relevant file is technically in context, the model has to track the cross-references between them, the cohesion is lower, and the surface area for a confident-but-wrong refactor is larger.

The failure modes I have watched are predictable in retrospect. The agent rewrites a controller and forgets the validator that lives in a separate folder. It modifies a service method and misses that a different controller depends on the old shape. It introduces a new DTO that almost matches an existing one because it did not retrieve the existing one. None of these are agent bugs. They are codebase-shape bugs that the agent surfaces faster than a human would.

Code structure has quietly become a context-engineering decision as much as a developer-ergonomics one. The shape of the source tree is what the retrieval system has to work with. A codebase organised by feature gives that retrieval system a unit that matches the work. A codebase organised by layer asks it to reconstruct the work from fragments, every time.

What I do now#

For new services I default to vertical slice. In .NET that means FastEndpoints or Minimal APIs with the endpoint as the application layer. In Node and TypeScript projects it means a folder per feature with the request validation, handler, and response co-located. In Python it means the same shape: a directory per use case, with the FastAPI router or worker entry point sitting next to the schemas and the handler.

A few practical rules I lean on.

One folder per feature, named after the business operation, not after the HTTP shape. CreateServiceRequest rather than ServiceRequestPostController.

All the production artefacts for the feature live in the slice folder: request, validator, response, handler, any feature-specific types. The folder is the unit you load into your head and the unit the agent retrieves.

Shared infrastructure is shared, deliberately. Auth, logging, persistence primitives, the database context, the common error-handling middleware. These live in shared modules with clear names. They are not slices.

Shared domain is small and extracted from patterns. The first version of a feature is allowed to duplicate a little. The second similar feature is allowed to duplicate a little more. By the third, you extract the shared concept and name it. You do not extract pre-emptively, because pre-emptive extraction is what pulls the codebase back toward the layered shape.

Tests live in a separate test project that mirrors the slice structure. The production project owns src/MyService/Features/CreateServiceRequest/. The test project owns tests/MyService.Tests/Features/CreateServiceRequest/. Same folder name, separate compilation unit. Test-only dependencies stay out of production, and the same feature name shows up in both projects.

CLAUDE.md and AGENTS.md files describe the slice convention explicitly. The agent’s effectiveness depends on knowing the convention without having to infer it from the source tree. A two-sentence note that says “this codebase organises one folder per feature; add new features as new folders; do not introduce shared service layers without justification” is the cheapest context-engineering move available.

A pull request reviewer can load one folder and have the whole feature in context. The agent can retrieve one folder and have the whole feature in context. The same property serves both readers.

Where this bends#

Vertical slice is not a free architecture. It has failure modes worth naming honestly.

It can degenerate into copy-pasted near-duplicates if the discipline on extraction lapses. A codebase with sixty handlers that all repeat the same validation idiom is not a vertical-sliced codebase. It is a manual macro.

It fits some shapes of system better than others. A web service with many independent use cases is the canonical fit. A library, an SDK, a long-running pipeline with a small surface area, or a CRUD admin tool with mostly homogeneous endpoints are cases where layered or hexagonal architectures often read more cleanly, because the feature axis is degenerate. Rico Fritzsche makes a related point in Why Vertical Slices Won’t Evolve from Clean Architecture: the two architectures answer different questions, and they do not converge as the codebase ages.

The framework matters less than the discipline. FastEndpoints is the cleanest .NET expression I have used, but vertical slice predates it and works in plain Minimal APIs, in classic ASP.NET controllers if you structure the folders right, in Spring with @RestController packages, in FastAPI routers, in Express handler files. The principle is the shape of the source tree and the rule about where shared abstractions live. The library is a productivity multiplier on top.

The principle didn’t change#

Organise the codebase around what changes, which for most applications is a feature, not a layer. That is what helped my team at SPL onboard faster, review more honestly, and break fewer unrelated things, and it is what helps coding agents make smaller, more reliable changes: the context they load is the size of a feature, not the size of an architecture.

Bogard named this in 2018. Parnas and the cohesion researchers named it in the 1970s. The agent tooling vendors are naming it again now, indirectly, by building retrieval pipelines around the assumption that source-tree shape matters. The principle did not change. We just keep rediscovering reasons to take it seriously.

What I’m not claiming#

  • The SPL migration is described from memory, not from benchmarked figures. Onboarding speed, review size, and the drop in accidental cross-feature changes are honest observations from a working team. No internal SPL revenue, customer, or specific delivery numbers appear in this post.

  • The agent claim is observation plus inference. The retrieval-degradation literature (Liu et al. 2023, Hsieh et al. 2024) explains why a smaller, more local subset is easier to act on correctly, and the agent tooling vendors describe their pipelines in ways that line up with that. I have not run a controlled A/B benchmark on identical features rendered both ways. Treat the claim as a strong inference, not a measured number.

  • The cohesion advice applies most cleanly to application services with many distinct use cases. Pick the architecture the work asks for.

References#

Cohesion and modularity:

Vertical slice:

FastEndpoints:

Package by feature, across stacks:

Agent codebase navigation and context engineering:

Long-context retrieval evidence: