Design Principles Scale: From Refactoring Code to Migrating Systems
What if the same principles you use to refactor code could guide you through large-scale system migrations?
How do you migrate a live, heavily used system — without users noticing, without downtime, and without losing sleep?
In this issue, I’ll share a real-world story where we migrated production services from legacy infrastructure to Kubernetes, using a simple but powerful strategy: small, reversible steps.
Along the way, you’ll see how a concept we often use in code refactoring — Branch By Abstraction — can guide even large-scale system migrations.
In this issue:
Background: Two Different Deployment Pipelines
The Migration Strategy: Gradual Traffic Shifting
Zero-Downtime Deployments with Nginx
Branch By Abstraction — Beyond Code
Key Takeaways
Background: Two Different Deployment Pipelines
Several years ago, I worked on a project where we needed to migrate some critical supporting services — specifically reporting and exporting — from an AMI (Amazon Machine Image) setup to a Kubernetes-managed cluster.
The goal was clear: no downtime and no change to observable behavior.
Complicating matters, the system was deployed across multiple regions, and at any given time, users were actively generating reports and exporting PDFs.
Here's a quick overview of the two approaches:
Old approach:
Using Chef, Puppet, and some Ansible, we set up a Linux environment, configured Ruby, copied application files, and ran them with a process manager like PM2.
New approach:
Containerizing everything into Docker images, deploying to Kubernetes, and configuring autoscaling (e.g., spin up a new instance when CPU utilization exceeds a threshold).
Although the services’ behavior remained the same, the underlying infrastructure was completely different.
The Migration Strategy: Gradual Traffic Shifting
To migrate safely, we used a simple yet powerful strategy: shift traffic gradually.
Start with only 5% of the traffic routed to the new Kubernetes cluster.
Closely monitor key metrics — error rates, response times, resource usage.
If everything looks good, slowly increase the share: 10%, 20%, and so on.
If any problems appeared, roll back immediately.
Before we touched production traffic, we:
Ran extensive internal tests
Configured real-time monitoring via New Relic
Performed manual validations
Finally, we updated Route 53 to handle traffic splitting:
Added a new DNS record for the Kubernetes cluster, initially at 5%.
After four hours, increased it to 10%.
Left it overnight, then bumped it to 50% the next day.
Finally, we transitioned to 100% traffic.
The result?
The migration was so smooth that it almost felt unreal — no downtime, no user impact, and no late-night emergency calls.
Zero-Downtime Deployments with Nginx
This technique might sound complex, but you may have already used a similar idea at a smaller scale during zero-downtime deployments — for example, with Nginx.
Nginx’s upstream module lets you control how traffic is distributed between servers:
http {
upstream backend {
server backend1.example.com weight=70;
server backend2.example.com weight=30;
}
}
server {
location / {
proxy_pass <http://backend>;
}
}
In this example, backend1
handles 70% of the traffic and backend2
handles 30%.
You can:
Adjust weights to shift traffic.
Test the configuration:
sudo nginx -t
Apply changes with a hot reload:
sudo service nginx reload
Simple, gradual, reversible changes — that's the key.
Branch By Abstraction — Beyond Code
The strategy we used mirrors Branch By Abstraction, a common refactoring technique.
In code, Branch By Abstraction means:
Extracting an interface from an existing implementation.
Redirecting clients to the interface.
Gradually replacing the old implementation behind the scenes.
Removing the abstraction once the migration is complete (optional).
Our infrastructure migration followed exactly the same principle — just at a system level, not within code.
Whether you’re migrating a service or refactoring code, Branch By Abstraction is about de-risking change by making it gradual and reversible.
Key Takeaways
Small steps beat big leaps: Gradual traffic shifting minimizes risk and builds confidence at every stage.
Monitor aggressively: Good monitoring is not optional. It's your early warning system.
Be ready to roll back: Plan for failure even if you hope it won’t happen.
Techniques scale: Principles like Branch By Abstraction aren’t limited to code — they work beautifully at the system level too.
Simplicity wins: Simple tools (like Route 53 traffic policies or Nginx upstream weights) can solve complex problems elegantly.
Final Thoughts
Whether you're refactoring code or migrating systems, the same principles apply — just at different scales.
Next time you apply a design principle, try thinking beyond code. You might find new ways to solve even bigger problems.