November 07, 20257 min readdesignstrategy

Why Big Software Redesigns Almost Always Fail

Why Big Software Redesigns Almost Always Fail. software redesign failure Insights from The Proper Motion Company.

Every few years, a software team reaches the same painful conclusion: “We need to rewrite this from scratch.” The codebase has become a tangled mess, the original developers have long since left, and every new feature takes three times longer than it should. A full redesign seems like the obvious answer. And yet, history tells us that big-bang redesigns fail far more often than they succeed. Understanding why is the first step toward finding a better path forward.

The Second System Effect and Its Modern Cousins

Fred Brooks identified the “second system effect” in 1975, and it remains one of the most reliable patterns in software engineering. When a team gets the chance to rebuild something from scratch, they inevitably try to fix every perceived shortcoming of the original. Features that were consciously deferred get added to the requirements. Architectural decisions that evolved organically get replaced with elaborate abstractions. The scope balloons because the team conflates “rebuild” with “reimagine.”

Netscape’s decision to rewrite its browser from scratch in 1998 is the canonical cautionary tale. The rewrite took three years, during which Internet Explorer captured the market. But you don’t need to look at ancient history for examples. Countless SaaS companies have attempted multi-year platform rewrites only to find that the new system, upon launch, is missing features that customers relied on daily. Those features weren’t in the spec because nobody realized they existed – they’d been added as one-off fixes over years of production use.

The modern variation of this effect is what you might call “architecture astronautics.” Teams adopt microservices, event sourcing, CQRS, and domain-driven design all at once in the new system, when the old monolith’s real problem was just poor module boundaries. Over-engineering the replacement introduces new categories of complexity that didn’t exist before.

Related: Designing Data-Heavy Applications: Tables, Charts, and Dashboards

Why the Old System Knows More Than You Think

A production codebase that has been running for years contains an enormous amount of encoded knowledge. Every odd conditional, every seemingly redundant check, every piece of business logic that looks wrong – these often represent hard-won lessons from production incidents, customer edge cases, and regulatory requirements.

When you start a redesign, you typically begin with the documented requirements. But documentation is always incomplete. The real specification is the running code itself. Researchers at Microsoft studied large-scale rewrites and found that teams consistently underestimated the number of business rules embedded in legacy systems by 60 to 80 percent. Those rules surface only when customers start filing bug reports against the new system, months after launch.

This is why incremental refactoring, even when it feels painfully slow, preserves institutional knowledge that a rewrite destroys. Martin Fowler’s “Strangler Fig” pattern – where you gradually replace components of the old system while keeping it running – exists precisely because the old system serves as a living specification. Each piece you replace gets validated against real production behavior before you move on.

The Organizational Trap: Running Two Systems

One of the most underestimated costs of a big redesign is the organizational overhead of maintaining two systems simultaneously. During a rewrite, the old system doesn’t stop needing bug fixes, security patches, and urgent feature requests. You end up splitting your team: some people work on the new system, while others keep the old one running. Neither group has enough people.

This creates a vicious cycle. Bugs fixed in the old system need to be ported to the new one, but the architectures are different, so “porting” becomes “reimplementing.” New features requested by customers get built in the old system because the new one isn’t ready, which means the target the new system needs to hit keeps moving. Teams report spending 30 to 40 percent of their rewrite effort just keeping the new system at feature parity with the ongoing changes to the old one.

The morale effects are equally damaging. The team working on the old system feels like they’re on a sinking ship. The team working on the new system faces constant pressure to ship faster while the company runs on aging infrastructure. When the rewrite inevitably takes longer than planned, executive patience runs thin, and the project gets cancelled or rushed to a premature launch.

See also: Web Typography Guide for Business Applications

When Redesigns Actually Succeed (and Why)

Not every redesign fails. The ones that succeed share a few common characteristics worth studying.

First, successful redesigns have a narrow scope. Rather than replacing the entire system, they target a specific bounded context – a billing engine, a search subsystem, a permissions layer. This limits the blast radius and lets the team ship value incrementally.

Second, successful redesigns run the old and new systems in parallel with automated comparison. Stripe’s approach to rebuilding their payments infrastructure involved running new code paths alongside old ones and comparing outputs for millions of transactions before switching over. GitHub did something similar when rebuilding their merge algorithm. This “dark launching” approach catches discrepancies before they affect users.

Third, successful redesigns are driven by a specific, measurable problem rather than general dissatisfaction. “Our search latency is 3 seconds and needs to be under 200 milliseconds” is a redesign-worthy problem. “The code is messy” is not – that’s a refactoring problem.

Fourth, they maintain a single team ownership model. Instead of splitting the organization, one team owns both the old and new implementations and can make trade-off decisions in real time.

The Strangler Fig in Practice

The Strangler Fig pattern deserves more than a passing mention because it’s the most reliable alternative to big-bang redesigns. The approach works like this: you place a routing layer (an API gateway, a reverse proxy, or even a feature flag system) in front of your old application. New requests get routed to new code; everything else continues hitting the old system. Over time, you migrate functionality piece by piece until the old system handles nothing and can be decommissioned.

In practice, this means identifying the seams in your existing system – the natural boundaries where one module talks to another. These seams become your migration points. You build a new implementation behind one seam, validate it thoroughly, switch traffic over, and move to the next seam.

Tools like NGINX, Envoy, or even application-level middleware make the routing straightforward. The key discipline is resisting the urge to “improve” unrelated parts of the system while you’re migrating. Each migration step should be as boring as possible: same behavior, new implementation, better structure.

A practical pattern for database-backed applications is the “Branch by Abstraction” technique. You introduce an abstraction layer over the data access code, implement the new version behind that abstraction, and toggle between old and new at runtime. This lets you run both implementations simultaneously and compare results before committing to the switch.

The Political Economy of Rewrites

Big redesigns don’t just fail for technical reasons. They fail because of the organizational dynamics they create. Understanding these dynamics helps explain why even well-intentioned rewrite projects go off the rails.

A rewrite gives every stakeholder a chance to re-litigate old product decisions. The sales team wants their feature requests included. The compliance team wants a new audit trail system. The CEO wants the new design to match their vision of what the company “should” look like. Each request is reasonable in isolation. Collectively, they transform a technical modernization into a complete product overhaul – a fundamentally different project with a fundamentally different risk profile.

Sunk cost psychology compounds the problem. Once a team is six months into a rewrite, admitting it’s not working becomes emotionally and politically impossible. Careers are attached to the project’s success. Budgets have been committed. So the team pushes forward, adding headcount, extending timelines, and reducing scope in ways that undermine the original justification. By the time the rewrite ships (if it ships), it’s often a lateral move at best – different technical debt instead of less technical debt.

The information asymmetry between technical and business leadership makes this cycle hard to break. The engineering team knows the rewrite is in trouble, but reporting that reality risks the project being cancelled and the team being blamed. So progress gets reported optimistically, problems get described as temporary setbacks, and leadership doesn’t get the clear signal they need to intervene until it’s too late.

Making the Case for Incremental Change

If you’re currently staring at a legacy codebase and feeling the pull of a full rewrite, here’s a more productive framework. Start by cataloging the specific pain points: what takes too long, what breaks frequently, what blocks the features customers are asking for. Rank these by business impact, not by how much they annoy the development team.

For each pain point, ask whether the problem is structural (the architecture genuinely cannot support what you need) or accidental (the code is messy but the architecture is sound). Structural problems sometimes do require rebuilding a component. Accidental complexity responds well to focused refactoring campaigns – dedicated sprints where the team improves test coverage, extracts modules, and cleans up interfaces.

Set a budget for technical debt reduction – typically 15 to 20 percent of engineering capacity – and apply it consistently rather than in occasional heroic efforts. Track improvement through concrete metrics: deployment frequency, change failure rate, time to onboard a new developer, mean time to resolve production incidents.

This incremental approach lacks the drama of a big redesign announcement. Nobody gets to stand up at an all-hands and unveil a grand architectural vision. But it works. The system gets better month by month, customers never experience a disruptive migration, and the team builds confidence through a steady stream of visible improvements.

The best software organizations treat their systems as living things that evolve continuously, not as buildings that periodically need to be demolished and rebuilt. The unsexy truth is that disciplined, incremental improvement almost always outperforms the bold rewrite – not because rewrites are technically impossible, but because the organizational, political, and knowledge-management challenges they create are almost always underestimated.

If your team is wrestling with whether to redesign or refactor a legacy system, we can help you make the right call. Get in touch to discuss a pragmatic modernization strategy that reduces risk and delivers results incrementally.

// Related posts

Data Dashboard Design: Principles for Complex Applications

How to design dashboards that enable faster decisions for data-heavy operations.

Designing Onboarding Flows That Reduce Churn

Designing Onboarding Flows That Reduce Churn. onboarding design

Why Great Software Feels Invisible to Users

Why Great Software Feels Invisible to Users. invisible software UX