What is website availability?

It is critical to design, develop, test, and deploy web applications so they can scale well under heavy and ever-increasing load. However, in spite of the best-laid plans and preparations, servers can fail for seemingly unknown reasons, causing your site to become unavailable. If and when a server fails or becomes overloaded, you want to ensure that the failure won't adversely affect your business by preventing your customers from accessing and using your web application. If it does, you risk jeopardizing your bottom line with lost sales and disgruntled customers who will look to your competitors for goods and services.

This section defines and describes website availability and failover:

Availability and reliability

In simple terms, availability and reliability mean that you can access a website by entering the site's URL in your browser, and all of its features work as intended. Thus, availability and reliability refer to the uptime of a website, which is often directly related to the uptime of the web server and other dependent servers, such as a database server, an application server, or a file server. For a site to be considered available, all of the servers that provide its functionality must work.

Web server and other dependent servers

For JRun and ColdFusion web applications, it is particularly important that the servers remain as highly available and responsive as the web server and other dependent servers. JRun and ColdFusion process requests sent to them from the web server. Upon successfully processing the application logic, JRun and ColdFusion return the results to the web server, which in turn returns an HTML response to the browser.

Availability and reliability are concerned with keeping the relevant servers that provide services to your web application available at all times. However, if a server on which your site depends becomes unavailable, you must have a sound redundancy scheme to make certain that your site remains available. As your organization moves into an e-business paradigm, you must plan, design, and implement load-balancing and failover strategies that guarantee that your servers will remain operational.

If servers employ a good strategy for load balancing and failover, they provide high availability and reliability to their users. In fact, Internet Service Providers (ISPs) that host commercial websites and offer 24x7 technical support as a competitive service differentiator typically specify in written service-level agreements a percentage of time that they guarantee a website will be available. If the ISP has a sound scalability and failover strategy in place, this figure is usually in the range of 99% or better.

Common failures

Following are typical types of failures that can negatively impact your web application's availability and reliability:

Hardware failures - while less common than software failures, hardware failures do occur, and can include crashed hard drives, blown processors, and corrupted network cards. Diagnosing and fixing these issues can be a lengthy endeavor because of time spent getting parts and performing labor. If your web application is mission-critical, you should ensure a sound hardware redundancy strategy to avoid costly downtime. A sound strategy includes a minimum of two, but preferably three, web servers.
Software failures - the software failures that most affect a web application involve the web server's operating system, the web server software itself, or the web application software. If the operating system crashes or becomes corrupt, the web server cannot function properly (or perhaps at all), compromising your web application's availability, reliability, and performance. Similarly, if the web server software crashes or acts erratically, it will probably cause the web server to stop running. Preparing for software failures is difficult, but if you have mirrored secondary hardware systems in place to account for failures, you'll minimize your web application's downtime.
Server failures - other servers on which your web application depends can also fail, causing downtime or diminished capabilities on your site. For example, for distributed applications, a proxy server might go down, causing requests for your web application's services to go unanswered. Or the database server might crash, making it impossible for users to submit or retrieve information from your database. Or a mail server might go down, making it impossible for your users to successfully send mail to you. Ensure that your organization's IT architecture includes network monitoring and notification software that can quickly report on the general health of your network and alert you about any failed servers.

Website availability scenario

Imagine that you have just built a robust, interactive e-commerce website on which you plan to sell the most sought-after books and music in the world. You have used Java scriptlets to build the application, so of course you've taken advantage of its many built-in features, including secure database access, multithreading, and integrated session management.

Upon finishing the development work and quality assurance testing, you deploy the website onto one production web server that is hosted within your IT department. The IT department informs you that it can use its existing Internet connection to make your site live, avoiding the additional hosting support cost of using an outside vendor.

The site goes live, and it's an instant success. Orders start pouring in the very first day, and huge numbers of people log on to browse and buy. Everything seems perfect. Then, on the second day of business, the load hitting the site is so high, the web server's performance slows to a crawl, eventually causing the server to become unavailable. Suddenly, your tech support lines are ringing off the hook with complaints that users cannot access your site, causing you to lose significant business.

Although the application provided many useful features and capabilities, customers could not access them, because the site's performance degraded to the point that the site became unavailable. Because the site was deployed on only one server, the incoming traffic could not be load balanced. Also, without redundant servers in place, the site could not intelligently load balance increasing traffic nor redirect traffic to other available servers (no failover).

This simple scenario illustrates how critical adequate scalability, performance, and failover planning are to any successful web development effort. Servers can become overloaded or fail at any time, so ensure that your design, development, testing, and deployment strategies are sound, promote good communication between necessary departments, and include adequate disaster recovery capabilities.

Failover considerations

The ability to failover unavailable servers to redundant servers is a cornerstone of any mission-critical application, one that ensures an application's continuous and reliable operation. Such disaster planning and recovery can be broken down into these topics:

Review the following considerations to ensure that you have a sound failover strategy in place - one that guarantees your website's availability.

Hardware planning

As indicated in the availability example above, you must acquire all necessary hardware and configure it before you deploy an application. All websites have different requirements, feature sets, purposes, audiences, and budgets, and therefore different needs. However, if your site is a business-critical system that affects your company's bottom line, you must ensure an appropriate redundancy strategy by having two or more redundant systems in place. In fact, Macromedia recommends that you use a minimum of three servers to support a critical website, so you can take one server offline to perform update and maintenance tasks while maintaining at least two servers in production at all times. This scheme provides administrative flexibility and protects your site from hardware or software failures.

The two predominant redundancy models used today are:

Primary/backup servers - an example of this model would be an important web application that receives relatively little traffic, such as an intranet. Typically, this redundancy model uses an expensive, high-capacity server for the primary server, and an inexpensive, lower-quality server for the backup server in case the primary server fails.
Parallel servers - this is a classic load-balancing/redundancy mode, and is used most often for business-critical applications. Unlike the primary backup scheme, the multiple servers in a parallel scheme are considered peers and are grouped as a single entity to support one or more applications.
You can use identical cloned hardware in your server clusters, or you can mix hardware sizes and models. Cloned, higher-capacity, higher-end hardware might have greater up-front hardware costs, but help minimize long-term administration costs. Conversely, mixing hardware models and capacities might be less expensive in the short term, but could add administrative costs later on.
If you plan to use a parallel model, using many middle-range servers, rather than fewer high-end ones, or many inexpensive ones, is recommended. Servers that provide adequate capacity and are moderately priced can generally accommodate your needs as well as expensive ones, but at a fraction of the cost.

Systems monitoring

Ensure that your network and the mission-critical sites that reside on its servers are supported by systems-monitoring software. This type of software actively and continuously monitors an application's availability and service levels. These monitoring programs must be able to not only detect problems, but also route alerts to administrators for immediate notification of problems.

Corrective actions

The third major failover consideration is the corrective actions that must occur if a failure causes a server to become unavailable. Generally, if a server goes down and causes your site to become unavailable, some level of human interaction is usually required to effectively diagnose and correct the problem.

However, before the analysis and repair can occur, the administrator must be notified. Whatever failover system you put in place, it should include an automated notification system that can route alerts through your telecommunications infrastructure (e-mail, pagers, real time Web-based alerts, and so on) to the appropriate administrator for prompt attention.

Besides notifying the administrator that a problem has occurred, you also want your failover solution to automatically redirect traffic intended for the unavailable server to other available servers until the unavailable server is fixed. This crucial corrective action is what keeps your website up and available to your users even if one of the servers supporting it is experiencing problems.

Using ClusterCATS
Scalability and Availability Overview