Scaling Your Website: Three Stages

So, you have A Big Idea that will Revolutionize The World. Naturally, people will flock to your site and engage with it. Naturally, you are concerned about whether your infrastructure can handle success. You wish to plan for success, it would be foolish not to, eh? Don’t let enthusiasm (or ignorance) lead you to premature over-optimization. After all, your Big Idea has not yet passed the test of time.

Here I will describe the three early stages commonly encountered when scaling a new website or startup application. Each one builds off the previous ones, starting with the lowest-hanging fruit and working up.

Achieving each level requires significant amounts of work, testing, and benchmarking. Consider the economics and do not scale more than is necessary. As you grow, you will, if successful, have the revenue to scale with the demand.

At each stage you should benchmark to test the capacity you can handle, and monitor your load with respect to this benchmark so that you can scale proactively, before your site breaks rather than after.

Stage 1: (1) application server, (1) database server

Use an opcode cache such as APC to speed up PHP execution.
Move the database onto its own server.
Run Percona Server 5.5 instead of MySQL 5.5.
Tune your database server configuration (my.ini).
Swap out Apache for Nginx, it is a lot more efficient.
Store your user session data in the database (but watch out for this problem).
Store static assets in the cloud and server them using a CDN (I recommend Amazon S3 + Amazon CloudFront, but there are others). This will take a large portion of the load off your application server.
Store file uploads in the cloud (Amazon S3).
Audit the database queries and table structures for inefficiencies (missing indexes, poorly optimized joins, excessive queries).

Both servers should be running at least 512MB to 1GB of RAM. Properly configured, this simple setup will handle quite a beating. With fewer moving parts, you will have more time to develop and enhance your offering. As traffic grows, add RAM until it is more economical to move to Stage 2.

Stage 2: (1-2) load balancers, (2-5) application servers, (1) database server

Add 1 or 2 dedicated servers with load-balancing software running.
Add 2 or more identical copies of your application server.
Scale up your database server to at least 2GB RAM and re-tune your database server configuration.
Change your domain DNS to point to the IP address of the load balancer.
Use a deployment method that supports server clusters, such as Capistrano.

Cloud-hosted load balancing

Rackspace and Amazon provide load balancing as a service. You get redundancy, and you can use smaller web servers (256-512MB RAM) for cost savings. Amazon provides more datacenters and more flexibility. The cost of rolling your own load balancer is unlikely to be justified at this stage, and I highly recommend using one of these unless you have skilled system administrators on staff and have done a cost analysis that favors self-hosting.

When combined with Amazon’s Route 53 DNS service, you can configure multiple load balancers in multiple regions to provide an even greater level of redundancy.

Rolling your own load balancer

There are several open-source load balancing applications available, evaluate and choose the best one for your needs:

Squid Cache, configured as a reverse proxy. Note that Squid can handle SSL (https) if needed, allowing your application servers to communicate with the load balancer more quickly over normal HTTP.
HAProxy
Nginx, configured as a reverse proxy
Pound

If you are running more than one load balancer, use keepalived to handle failover.

Code deployments

When multiple application servers are involved, you will need an automated way to push out code deployments. There are several ways to do this:

Create a tool using the Capistrano Ruby gem
Roll your own tool using your preferred scripting language and rsync
Configure a protected git server with your code, and set up a cronjob to perform a `git pull`
Use a hosted code repository service like Beanstalk which supports code deployment via SFTP

Stage 3: (2+) load balancers, (5-10) application servers, (1) master database server, (2+) slave database servers

Scale your primary (master) database server to a sufficiently large size. Use Amazon RDS for a cloud-based solution, or get a high-performance dedicated server with 4-8 CPU cores, at least 3 SSD hard drives in RAID 5, and 8-32GB RAM (use at least as much RAM as your database size). All database inserts, updates, and deletes will be done on this server.
Add two or more dedicated database servers to function as read replicas. The server specifications will need to match that of your master database server, despite the fact they will be doing nothing but handling database read requests. Configure MySQL master-slave replication or DRBD to keep your read replicas in sync with the master database.
Update your application code to connect to the master database for writes and to the replicas for reads. Using a random or round robin scheme for choosing a read replica to connect to, or some other method based on your needs.

Beyond Stage 3

Beyond this point, scaling your site and keeping it running smoothly will require a dedicated team of system administrators, architects, developers, and testers. This is the situation that large-scale websites like Facebook, Twitter, Pinterest, and Etsy currently have and it requires lots of innovation, time, and cash.

Specialized Tools/Techniques

These could be selectively intermingled anywhere along the way, as needed.

Search – use a dedicated search server, such as Sphinx.
Use memcached on dedicated servers (or Amazon ElastiCache) to cache parts of your application, and/or your user session data.
Migrate part or all of your database to a non-relational database, such as MongoDB (or Amazon DynamoDB).
Partition or shard very large databases to get less-frequently-accessed data out of the way (and out of your database cache).
Implement load balanced work queues using Gearman or similar to do background processing in an asynchronous fashion.
Run reporting database queries on a separate database. Either run your reports on a dedicated read replica, or use one of the specialized data warehouse solutions that are on the market (such as Infobright).

Conclusion

Evaluate how much you really need to scale. The decision needs to be based on the data, of how much traffic you actually have now, how much you can afford to spend on scaling, and how much traffic you reasonably anticipate in the short term future–not on technological fads or overly optimistic dreams of grandeur. Remember that premature optimization will get you in trouble.

Should you do this yourself, or hire consultants? That depends on how fast you anticipate growing and what you can afford to pay. In light of the cost of hiring and retaining really competent developers and system administrators, you may find using consultants in the early stages to be more cost effective.

Written on December 16, 2012