ION Accelerator High Availability


Introduction

The ION Accelerator High Availability feature enables deployment of Fusion ION Accelerator units into environments that require the highest levels of data protection and availability goals.

Properly implemented high availability architectures must meet the following basic objectives:

  • Protect data from failure of system components
  • Provide data availability even if a system component fails
  • Provide data availability while failed components are replaced
  • Provide non-disruptive software/firmware upgrades
  • Provide non-disruptive Field Replaceable Unit (FRU) servicing
  • The mechanism that implements HA must be fault-tolerant

The ION Accelerator High Availability storage service meets these objectives with a solution that has low complexity and high performance.

Basics of ION Accelerator High Availability

With the HA feature, every write to an ION Accelerator volume is synchronously replicated across a pair of clustered ION Accelerator nodes.  A write is not acknowledged to the initiator until it has been written to power-cut-safe ioMemory on both nodes. As a result, data is protected from component failures or catastrophic failures of a single node.  If an application is unable to access data from an HA volume, it will automatically use an alternate path to data on the other node.  As a result, data remains available even if access to one ION Accelerator node is lost. When a node is being repaired or replaced, the remaining node ensures data availability by staying online.  When the repaired node is brought back online, it is automatically resynchronized with the online node before resuming service.  Software and firmware upgrades are non-disruptive through rolling cluster updates.  One single node is updated and then restarted while the remaining node stays online, providing data availability.  The HA feature employs redundancy to maintain performance and availability against data fabric or cluster interconnect failure.


Figure 1: ION High Availability

ALUA Multipathing with ION Accelerator HA

In the dark ages of storage networking, storage array developers had to develop custom drivers for server multipathing. These custom drivers were needed due to an absence of standards for managing multipath I/O in dual controller configurations.  Along with SCSI3 standards initiated in 2005 came ALUA (Asymmetric Logical Unit Access), providing a method to implement multipathing without having to create custom drivers.  ALUA multipathing has a couple of key mechanisms for controlling access to LUNs and managing failover and failback between asymmetric active:active controllers, called implicit and explicit ALUA.

Implicit and explicit ALUA refer to methods for managing which paths are active to an array. Implicit ALUA is a means for a storage cluster to communicate to an initiator which storage path should be used per LUN, and explicit ALUA allows an initiator to switch over to another path in case a storage path between the initiator and a cluster node fails.

End-to-End Availability

ION Accelerator HA works hand-in-hand with server, application, and storage architecture redundancies. For example, clustered servers, whether clustered by the operating system (such as Microsoft Cluster) or the application itself (such as Oracle RAC), provide high availability in the event an application server fails.

Redundant interface paths from servers directly to ION appliances or through redundant fabric switches create fault-tolerant I/O paths. When combined with the ION Accelerator HA feature the entire solution provides end-to-end data availability.


Figure 2: ION Accelerator and Oracle RAC


ION Accelerator Management

ION Accelerator HA is administered by the ioSphere graphical user interface (GUI) and Command Line Interface (CLI). Both the GUI and CLI provide Single-System-Image, an ease of use simplification that allows multiple ION nodes to be managed as one. ION Accelerator HA configuration, event monitoring, and performance are managed by an integrated, cluster-aware GUI and CLI.

Non-Disruptive Update and FRU Servicing

ION Accelerator HA provides Non-Disruptive Update (NDU) and Field Replaceable Unit (FRU) servicing. Software update packages are applied one node at a time. Each node automatically enters a maintenance mode, performs the update, and then rejoins the HA cluster. A rolling HA update is automated by ioSphere.

Similarly, if a FRU, such as an ioDrive, HBA, or Cluster Interconnect Card requires servicing, the node can be taken into maintenance mode, servicing performed, and when appropriate, bought back online.

Cluster Management in ION Accelerator HA

ION Accelerator HA relies on industry leading tools (Corosync and Pacemaker) for cluster resource management and messaging for detection and recovery of node- and resource-level failures. These tools are fully integrated into ION Accelerator HA.  They require no additional user configuration and are maintained as part of the overall ION Accelerator software package.  ION’s integrated software stack works seamlessly to provide a robust high availability solution.


Volume Replication in ION Accelerator HA

ION Accelerator HA includes a volume replication capability that is based on Linbit’s DRBD.   According to Linbit, “Using DRBD in conjunction with Pacemaker is arguably DRBD’s most frequently found use case.

Pacemaker is also one of the applications that make DRBD extremely powerful in a wide range of usage scenarios.”[1]

With ION Accelerator HA, every volume has a role, which may be primary or secondary.

The choice of “primary” and “secondary” as terms here is not arbitrary. These roles were deliberately not named “Active” and “Passive”.  Primary vs. secondary refers to a concept related to availability of storage, whereas active vs. passive refers to the availability of an application. It is usually the case in a high-availability environment that the primary node is also the active one, but this is by no means necessary.

  • An ION Accelerator volume in the primary role can be used for read and write operations.
  • An ION Accelerator volume in the secondary role receives all updates from the peer node’s device, but otherwise disallows access completely. It cannot be used by applications, neither for read nor write access. The reason for disallowing even read-only access to the device is the necessity to maintain consistency, which would be impossible if a secondary resource were made accessible in any way

At any given time, a volume is in the primary role on only one cluster member.

ION Accelerator HA implements a synchronous replication protocol.   Local write operations on the primary volume are considered durable only after both the local and the remote write have been completed.  As a result, loss of a single node is guaranteed not to cause data inconsistency. Data loss is, of course, inevitable even with this replication protocol if both nodes are irreversibly destroyed at the same time.

The ION Accelerator’s HA replication and synchronization framework leverages commodity high performance Ethernet networking.

Efficient Synchronization

(Re-)synchronization is distinct from replication. While replication occurs on any write event to a resource in the primary role, synchronization is decoupled from incoming writes. Rather, it affects the device as a whole.

Synchronization is necessary if the replication link has been interrupted for any reason, be it due to failure of the primary node, failure of the secondary node, or interruption of the replication link. Synchronization is efficient in the sense that ION Accelerator HA does not synchronize modified blocks in the order they were originally written, but in linear order, which has the following consequences:

  • Synchronization is fast, since blocks in which several successive write operations occurred are only synchronized once.
  • Synchronization is efficient; activity is logged and only hot extents need be synchronized.
  • During synchronization, the data set on the standby node is partly obsolete and partly already updated. This state of data is called inconsistent.

Initiator access to storage remains available on the active node, while background synchronization is in progress.

Note: A node with inconsistent data cannot be put into operation, thus it is desirable to keep the time period during which a node is inconsistent as short as possible. ION Accelerator, more so than any typical replication approach, accomplishes this due to its use of high speed Fusion ioMemory as the storage media.

Variable-rate Synchronization

In variable-rate synchronization, ION Accelerator HA detects the available bandwidth on the synchronization links, compares it to incoming foreground application I/O, and selects an appropriate synchronization rate based on a fully automatic control loop. The administrator can, of course, schedule a time of day for synchronization when performing planned maintenance.


Split Brain Notification and Automatic Recovery

Split brain is a situation where, due to temporary failure of all network links between cluster nodes, or possibly due to human error, both nodes switch to the primary role while disconnected. This is a potentially harmful state, as it implies that modifications to the data might have been made on either node without having been replicated to the peer. Thus, it is possible in this situation that two diverging sets of data have been created, which cannot be trivially merged.

ION Accelerator HA resolves detected split brain situations in the following manner:

  • Graceful recovery if there have been no intermediate changes. In this mode, if only one node had made modifications during split brain, DRBD will simply recover gracefully and declare the split brain resolved. Automatic split brain resolution is a fairly likely scenario because initiators have no immediate reason to failover an I/O path; they will remain connected to ALUA preferred nodes.
  • Disconnect. If a divergence was detected, ION Accelerator HA will protect data by refusing to reconnect the replica.
  • Maintenance Mode. ION Accelerator HA will automatically enter maintenance mode and refuse to accept initiator I/O, if a split brain condition is detected during boot.

ioDrive Error Handling Strategies

Customers have the option of creating redundant (mirrored) or non-redundant storage pools, and in so doing, make an explicit choice that trades capacity for availability. Because ION Accelerator is replicating data between nodes, it is not strictly necessary to configure local redundancy to achieve high availability. If a FRU within a non-redundant storage pool goes offline, ION Accelerator HA will failover quickly. The initiator’s multipath I/O will failover to access replicated storage on the other node.


Application Acceleration using ION Accelerator

The Fusion ION Accelerator appliance maximizes business critical application performance for data-intensive workloads including databases, virtualization, VDI, and Big Data solutions across Microsoft SQL Server, Oracle, MySQL, SAP HANA, VMware vSphere and View, and many others.

For More Information

www.fusionio.com | 801 424 5500 | facebook.com/fusionio | twitter.com/fusionio

International sales: www.fusionio.com/contact