Creating scalable and highly available sites

When you understand the issues of scalability and availability, the next step is to learn the techniques you can use to achieve scalable and highly available websites.

This section describes the following topics:

What is clustering?

Clustering is a technique in which two or more web servers supporting one or more domains (such as www.yourcompany.com) are grouped as a cluster of servers, to collectively accommodate increases in load and provide system redundancy.

The following figure shows an example of a server cluster for a website:

Cluster of servers for a website

Clustering for scalability works by distributing load among servers in the cluster (load balancing) using an unintelligent-but-regular distribution sequence (round-robin DNS and routers) or a predefined threshold or algorithm (specialized clustering software) that you specify and can adjust for each server in the cluster.

Clustering for failover relies on redundant servers to ensure that business-critical applications remain available if one of the servers in a cluster fails. Intelligent software-based failover solutions can detect when a server has failed and automatically redirect new incoming HTTP requests to available cluster members. Some hardware-based failover devices that have less built-in intelligence require an administrator's intervention when a failure is detected.

Clustering can be accomplished using software-based solutions, such as round-robin DNS alone or together with a third-party package; a hardware-based solution, such as a packet router; or a combination of the two.

Hardware-based clustering solutions

A common and reliable hardware-based clustering solution is a packet router. One of the most popular routers is Cisco Systems' LocalDirector. A router, in front of a cluster of web servers, directs incoming HTTP requests to available web servers in the cluster. A router works by assessing the rate and volume of IP packet flow to and from web servers, and selecting the best server to accommodate the traffic. This process is fast and efficient. The router and clustered web servers comprise a virtual server.

Routers are considered semi-intelligent devices because they can detect a server failure and redirect requests to other servers. If a web server fails or stops responding, the router stops sending packets to the unresponsive server. Routers are not considered fully intelligent because, while they can redirect requests upon discovering a failure, they do not let you configure redirection thresholds for individual servers. They also do not support application-aware load balancing.

The following figure shows a router distributing requests in round-robin fashion to the available servers in a web server cluster:

Router distributing requests to available servers

Advantages of hardware-based solutions

A hardware-based clustering solution, such as a router, is an attractive solution for the following reasons:

Note:   Load-balancing devices offer different features and capabilities.

Considerations

Carefully evaluate the following issues against a router's attributes:

Software-based clustering solutions

There are several kinds of software-based clustering solutions on the market. As with hardware-based clustering solutions, there are strengths and weaknesses associated with each. These software solutions include:

ClusterCATS lets you easily create, optimize, and maintain "smart" clusters to support your web applications. ClusterCATS runs on Windows, Solaris, and Linux platforms and works with leading mission-critical web servers, including Microsoft IIS, Netscape Enterprise Server, and Apache. It is easily administered from remote locations and provides robust features, including:

Advantages

The following benefits make a software-based clustering solution attractive:

Considerations

Consider the following issues when evaluating software-based solutions for your environment:

Combining hardware and software clustering solutions

Instead of having to choose either a hardware solution or a software solution, you can combine both types of clustering choices. Combining hardware and software solutions certainly provides the greatest scalability and availability capabilities for a site. A combined solution is an attractive option if your organization has already invested in one, but is looking for more comprehensive coverage. Having the flexibility to integrate hardware with software means that your organization won't necessarily have to absorb a capital loss on a previous technology investment if you decide to purchase additional clustering technology.

However, as already discussed, all hardware or software solutions are not equal. Many have different features and capabilities, and not all hardware and software integrate well together. Investigate thoroughly when purchasing technology to augment your current solution.

For a visual representation of hardware and software clustering solutions working together, see "Hardware-based clustering solutions".

Comments