What is High Availability?

Find out what high availability is, why it matters for all networks and infrastructures, and how to achieve it. You will read about essential techniques for ensuring high availability.
What is High Availability?

Find out what high availability is, why it matters for all networks and infrastructures, and how to achieve it. You will read about essential techniques for ensuring high availability.

Average website users have become more demanding, which is why high availability has become number one concern for web-developers. Maintaining website performance requires taking the whole complex of measures: handling growing system growth, reducing downtime, eliminating risk of failure and so on. High availability is a quality of infrastructure that addresses all of the above-mentioned considerations.

Let’s find out what exactly high availability is, and how to boost reliability of your infrastructure.

What Is High Availability?

From computing standpoint, ‘availability’ is the term that defines the period of time when a service is available, as well as the time for the system to respond to a request made by a visitor. High availability is a quality of components or a system that ensures that a decent level of performance is maintained during a certain time span.

How Availability Is Measured?

Typically, availability is expressed in percentage that shows how much uptime is expected from a system or a component during a given time lapse, e.g. a value of 100% means that this system never fails. For example, 99% availability would mean that during one year, there can be 3.65 days of downtime (1%).

This value depends on a wide range of factors, including planned and non-planned maintenance periods, as well as the time for the system to be recovered from failures.

How Does High Availability Work?

For better understanding, you can regard high availability as a failure response mechanism of the system. The way it works is pretty simple, but still requires specific configuration and software.

When Is It Mission-Critical?

High availability is important when you need to minimize downtime and eliminate the risk of service interruptions. No matter how reliable your software and systems are, issues can still occur and bring down your server and applications. Ensuring high availability is a useful strategy that helps to reduce the risk of such events. Systems with high availability can recover from component failure automatically.

​How to Make a System Highly Available?

One of the main goals of high availability is to reduce the risk of failure of your infrastructure and protect every single component of the system. Because even if one component breaks, the entire server and all services can become unavailable. Any element of infrastructure is a requisite for normal functionality of your applications without proper redundancy is a point of failure.

To eliminate all points of failure, you should prepare every layer of infrastructure for redundancy. Imagine the following situation: your infrastructure consists of two similar redundant web servers and a load balancer. The traffic coming from visitors will be distributed between these two servers, but if one server fails, the load balancer will redirect all requests and traffic to the functioning server.

​In this case, web server layer is not a single point of failure, as:

  • components for the same task are redundant;
  • the load balancer (mechanism at the top of the layer) can detect failures in elements and adjust its behavior for fast recovery.

But what if the load balancer breaks down?

If that happens for some reasons, the load balancing layer will become a single point of failure, and eliminating it can be pretty difficult. Even if you think you can easily configure en extra load balancer to reach redundancy, you should also somehow implement options for failure detection and recovery.

Redundancy alone is not 100% warranty of high availability: there must be a mechanism to detect failures and take actions as soon as one component of the system becomes unavailable. Pro tip: you can implement failure detection and recovery for redundant systems with the help of a top-to-bottom approach. It means that the top layer is responsible for monitoring the layer immediately for failures. In the above-mentioned scenario, the load balancer was the top layer. If one of servers (the bottom layer) becomes unavailable, the load balancer simply stops redirecting requests to the broken server.

This is a pretty simple approach, but it has some limitations: there may be a point in your infrastructure when a top layer is either out of reach, or is non-existent. This can be the case with the load balancer layer. Turns out, creating a failure detection service like load balancer in an external server can create another point of failure.

This is why a distributed approach is also necessary. You need to connect several redundant nodes as a cluster, where each node should be able to detect failure and perform recovery.

However, when it comes to load balancer, there’s one extra complication because of the way nameservers work. Recovery from a failure of load balancer means a failover to a functioning load balancer, so a DNS change must be made to point a domain name to the IP address of the redundant load balancer. Such change requires a considerable amount of time to be performed in the Net, which would cause a serious downtime for this system.

How to solve this problem? DNS round-robin load balancing can be a good solution, but it has one downtime: this is not the most reliable approach, because failover is left on client-side application.

Another reliable solution is to exploit the systems that allow for IP address remapping, for instance, floating IPs. On-demand IP address remapping reduces the problems with caching and propagation in DNS changes by providing a static IP address which can be remapped, if necessary. The domain name stays associated with the same IP address, while the IP address is distributed over servers.

Necessary System Components High Availability

To ensure high availability, you need to combine and consider several components. Together with software implementation, high availability also depends on:

  • If all of your servers are situated in one geographical area, a natural disaster like flooding or fire can take the whole system of servers down. Using redundant servers in multiple datacenters will boost reliability.
  • Servers with high availability should be resistant to hardware failures and power outages. Hard disks and network interfaces should function 24/7.
  • The whole set of software you use, including OS and applications, must be prepared for dealing with sudden failures that may require system restart.  
  • Inconsistency and loss of data can be conditioned by different factors, not only hard disk failures. Highly available systems should concern data safety in case of a failure.
  • Outages of network can also be points of failure for high availability systems. It’s crucial to have a redundant network strategy in place in case of possible risks and failures.

Is There Any Special Software to Configure High Availability?

Every later of highly available system has different needs, when it comes to configuration and software. At the application level, load balancer is a vital piece of software for creating redundant setup.

HAProxy (High Availability Proxy) is a widespread option for load balancing, because it can perform load balancing at multiple layers for various types of servers, including database servers. Having a load balancer is crucial for application entry point, and in order to eliminate this single point of failure, you should implement a cluster of load balancers with a Floating IP. For both CentOS and Ubuntu servers, you can use Corosync and Pacemaker.

Bottomline

High availability is a term meaning excellent redundancy of system infrastructure: it makes sure that a system or a component has a high level of performance over some certain period of time. Implementation of that can seem to be a challenging task, but it can have great benefits for systems that need exceptional reliability.

comments

Add comment:


Comments