Automatic Failovers

Automatic Host Failovers in Cloudmin

Cloudmin's automatic failover feature provides protection against host system failures, by moving virtual systems off a down host to a new host with shared storage. The running state (memory contents and network connections) of virtual systems will be lost in this case, but they will be re-started on the new host with minimal interruption.

For failovers to work, your host systems must share the storage location for virtual system disk images or filesystems. This can be done via NFS, Cluster LVM, iSCSI, or other network filesystem technologies. Cloudmin's failover feature does not yet include automatic replication of disk images between host systems, and cannot setup shared storage for you.

Currently automatic failovers are supported only for Xen, OpenVZ and KVM virtual systems.

Failover Groups

A failover group is a collection of host systems that support some virtualization type, and share storage used for virtual systems. Some or all systems running on those hosts will be failed over to another host in the group if one goes down, assuming that there is enough free RAM available. Groups can contain an explicit list of hosts, members of some location group, or hosts in a Cloudmin system group.

The steps to create a new failover group are :

  1. Login to Cloudmin as root, and go to Host Systems -> Host Failover Groups
  2. Click the Add a new failover group link
  3. Select a virtualization type from the Virtual system type menu.
  4. Enter a name for the group in the Group description field.
  5. If you want Cloudmin to automatically perform failovers, set the Failover group enabled? to automatic mode. Otherwise you can select manual mode to trigger failovers manually when you detect a host system has gone down.
  6. In the Host systems in failover group section, select the hosts that will be part of this group.
  7. To limit automatic failovers to only certain virtual systems, select them in the Virtual systems to failover section. You might want to do this for only important systems, or those on shared storage.
  8. Click the Create button.

Once a group has been created, it can be edited or deleted at any time by clicking on its description on the Host Failover Groups page.

Failover Notification

When automatic failovers are enabled, they will be triggered by Cloudmin's regular status collection process. When it detects that a virtual system's host is down, it will wait for the Host downtime timeout set on the failover group page, and then attempt to move the virtual system to a new host.

Upon success or failure, email will be sent to the master administrator and possibly the owner(s) of a virtual system. If the virtual system was running before its old host failure, it will be re-started on the new host after the failover completes.

The move can fail for several reasons :

  • No hosts in the failover group share the storage for this virtual system.
  • No hosts have enough free RAM
  • All other hosts in the group are down
  • Any error occurred copying configuration files to the new host

Triggering a Manual Failover

If you have set a failover group to manual mode, Cloudmin will not automatically move virtual systems off down hosts in the group. However, you can force a failover at System Operations -> Force Failover. On this page you can optionally select a specific new host system, and decide if the virtual system should be started up on the new host or not.

A manual failover can also be done using the failover-system API command.