Advertisment

Protecting Data In Clusters

author-image
DQI Bureau
New Update

With the increasing deployment of

Microsoft Windows NT Server Enterprise Edition software, server clustering is becoming

popular as a means to achieve high availability without the expense of deploying fault

tolerant redundant systems.

The clustered approach differs from

redundancy in that, with clustering, secondary servers-known as "nodes" in a

clustered environment-are in use performing their primary tasks such as providing email or

web access services. While redundant systems simply mirror the functions of the primary

server and perform no additional tasks. Clusters are classified as "highly

available" with an average uptime of about 99%. Redundant systems are classified as

"fault tolerant" with 100% uptime under all but the most cataclysmic

circumstances. With greater backup needs and shrinking backup windows, and given the high

costs of deploying fully redundant systems, many network architectures today are opting

for Microsoft's clustering system and its promise of near-continuous availability.

Clustered servers share a common client group and common data storage media, including

local disk drives on the servers and a shared disk subsystem, predominantly a RAID array

for additional fault resilience. Clustering software gives two or more servers the

capability to manage applications and files for the clustered group. Each node in the

cluster hosts a particular set of applications or services and broadcasts its operational

state by sending out packets, called " heartbeats", to all other nodes. If the

other nodes in the cluster fail to detect the heartbeat for a specified interval,

responsibility for running an application or operation passes to a second server, which

also continues to run its own designated set of operations.






Microsoft's Clustering Service provides the framework for determining how the cluster will
handle the failure of any of its components. For example, assume that server A is running

file and print functions while server B is handling database services. If server A fails,

Clustering Service directs server B to assume control over file and print functions for

the cluster while maintaining its primary functions as a database server.






To the client, the server cluster appears as a single group of available services. Users
interact with the cluster as a unified entity. Therefore, if a clustered server fails,

users imply experience a brief interruption in network response, usually 30 seconds or

less, while another server in the cluster restarts the application. Clustering

automatically maintains the IT infrastructure for 24x7 environments, restarting

applications and services without human intervention.






As Clustering Service technology matures, additional benefits accure to network
administrators. In addition to high availability at reduced cost, server clustering will

promote scalability enabling applications to grow beyond a single server. Because

applications often comprise many processes that can be set up to execute on multiple

servers, clustering will enable network designers to simply add a new server when an

application outgrows its original host. Clustering Service will handle the process

coordination.






Implications for data protection


While Clustering Service greatly reduces the cost of building highly available systems,
clustering magnifies the complexity of backup and recovery. In a non-clustered

environment, network architects can either deploy a dedicated tape backup system, such as

automated DLT systems or libraries, for each server. Or they can back up over the network

to a backup server depending on the backup window and the amount of data.






In a clustered environment, however, data protection issues become considerably more
complicated. The two standard backup methods-backing up over the network or to a directly

attached device-are either impractical or inadequate when safeguarding all information in

a cluster. While backup of smaller amounts of data can be conducted over the network,

standard networks are inadequate for larger backups. To achieve acceptable performance on

large data sets, a network administrator needs to back up the cluster's physical nodes to

a direct-attached tape system. But this approach presents its own set of problems.






Clustered applications are packaged as "groups", which include all the resources
required to run the applications. Included in a group is the virtual server name for the

application which lets a given application, run on different systems at any given moment.

To get a complete system image containing all the information needed to rebuild the

cluster, each physical node must be backed up, not just the virtual nodes that represent

the application groups.



Because clusters are inherently dynamic responsibility for managing data and applications
moves from server to server as needed to keep the system running. Such shifting of

responsibility is called "failover". And failovers cause uncertainty in backup.

A physical node's configuration can change depending on where the virtual node resides.

Therefore, an application could execute on one system at the time of backup and on another

at the time of restoration. And unless the backed up image of the cluster is identical to

the restored image, inconsistencies arise. In the event of a failover, there's no

guarantee that all critical data is backed up and recoverable.





THE PRESENT: Use what's out there now. For networks with smaller storage

requirements, over-the-network backup is probably the best option. Enterprise environments
with large storage requirements, however, should consider directly attached tape backup
devices for each node, since over-the-network backup in these sites consumes an

unacceptable percentage of bandwidth.






In this scenario, each server in the cluster has its own tape backup system.
Administrators can manually back up each node in the cluster, but the backup applications

don't know the node part of a cluster. Directly attached tape backup devices are only

capable of taking a snap shot of the cluster at the time of backup; they don't

automatically adapt to changes in cluster configuration caused by events such as

failovers.






Clustering Service employs a virtual volume residing on the shared storage device called a
"Quorum Disk". This volume contains the mechanism servers use to communicate

cluster information with each other and must be backed up along with the contents of

server drives and shared storage devices. The quorum Disk is actively owned by one of the

clustered nodes. Backing up physical nodes in the cluster doesn't necessarily guarantee

that the Quorum Disk is also backed up.






THE IMMEDIATE FUTURE: Within the next six months, the software that runs tape
backup systems will become "cluster-aware." This means that backup software

understands the dynamics of the clustered environment and makes API calls into the cluster

to determine where data resides and how to back it up. Cluster-aware backup software

understands the relationships of all components in the cluster and can automatically back

up all those components correctly.



Inconsistent images of a cluster's Quorum Disk and physical nodes can lead to trouble at
restoration time. If restoration is needed, cluster-aware backup applications understand

how to interrogate the cluster to determine storage configuration. They also know the

proper order in which to restore the various elements to maintain consistency.

Cluster-aware utilities will be able to interrogate the cluster to get a consistently

clean backup and use that information to properly restore the system with little or no

manual intervention.






THE NEXT STEP: Intelligent use of direct-attached backup devices. Choosing among
the pool of tape devices attached to the physical nodes in a cluster, the application

specifies the proper tape device dynamically at time of backup, and simply tells the

device to back up designated files, directories or applications. The tape backup

application, because it has kept track of clustered activity knows exactly where the

relevant data is located. This scenario provides automatically optimized backup, doesn't

require backup over the network, and frees IS staff from manual intervention in cluster

backup operations.






WITHIN THE NEXT YEAR: The most cost-effective and efficient cluster backup solution
will be achieved by building on prior developments-cluster-aware backup software and

intelligent direct-attached backup devices-to allow the cluster to share a single backup

device, such as a tape library or tape autoloader. And while such a connection could be

achieved over an existing technology like SCSI, future clustered servers are likely to be

attached to each other and to shared storage area network or SAN using new fiber channel

technology.






Chief advantages of a SAN include providing shared access to storage; providing a
dedicated high-bandwidth network separate from the production network; and providing the

basis for building additional intelligence into storage.






With the advent of efficient, affordable SAN technologies and cluster-optimized backup
control software, network planners will finally be able to take advantage of the best of

both worlds: clustered servers backing up to a single, network-attached DLT or tape

library system.






























Courtesy: Hewlett-Packard

Advertisment