Indian Airlines: Working Toward Disaster Recovery



Airline disasters are ever so frightful. Even natural causes like the
infamous ‘Delhi fog’ or human acts like flash strikes announced by the ATC
(Air Traffic Control) could prove haranguing to passengers. One of the largest
airline systems in Asia with a fleet of 62 aircrafts, Indian Airlines (IA) is no
exception to these ‘disasters’.

IA has been prudent enough to implement and manage a disaster recovery
programme way back in March 1998. IA could perhaps be the first commercial
organization in India to have implemented disaster recovery on mainframe systems
for mission critical applications. AK Rastogi, Director (IT), Indian Airlines,
found it important to adopt a good secondary site to tackle concerns, which
included: isolation due to power, telecommunication cables, hardware failures,
access to the premises, bombing, terrorist attacks etc.

AK Rastogi, Director (IT), Indian Airlines

He says, “IA had decided a decade back to look for an operative plan to
make its data secure and make sure work goes on, no matter what the contingency.
A very rigorous planning led to the development of another backup site, which
was made scalable for IA’s future needs.”

The configuration of the systems was planned with IBM, after a rigorous set
of investigations and walk-throughs. Since the DR site was part of the
production site for non-critical applications, IA avoided many complications,
which could have dawned from an outsourcing implementation.

DR on board
At IA, there are two sites: the production site (Site A) at the main airport
building and a backup site (Site B), housed at the IT headquarters half a km
from the production site. There are 2500 workstations connected to site A, along
with others, which are connected virtually (booking through the net and CRS).

Site A handles Passenger Services applications critical to IA’s business
(Passenger Services System: reservation, ticketing, departure control etc),
while systems at site B are used for various batch and online applications
(Personnel Information System: online aircrafts spares information
system/inventory system, frequent flier system, management information system
and payroll etc)-and, most importantly, as reliable backup if Production Site
A was to fail.

Site A uses an ALCS/TPFDF (airline control system/transaction processing
facility, database facility) environment while Site B uses IBM CICS/ DB2 (which
is a customer transaction interface called customer information and control
system).

There are two IBM ES9000, 9672 R21, mainframe systems with dual CPUs located
at the two locations. Both locations are connected to all its offices in India
and abroad through dedicated communication lines, with the two sites being
interconnected through high speed ESCON (IBM System Channel to Channel
Connection) fiber optic links.

Not a pie in the sky
The IA environment today is potent enough to handle a disaster by an
immediate shift of operations from one site to another, within 25 minutes, with
alternate paths in place in case of any network failures. This makes sure that
passengers and the 18,500 strong IA staff, do not suffer from the setback of
system failure. Twice, in 1999 and 2003, there have been instances when weather
inconsistencies resulted in operational problems at the production site, and the
production systems were shifted promptly to the DR Site with less than 30
minutes of switchover time, without loss of any data.

IA’s DR center has been built safe from most calamities, leaving only the
unexpected nature of earthquakes as a strong vulnerability. IA has its reasons
for its choice of site. A location at a non-seismic zone would invariably be a
faraway back-up site, depriving this dynamic site from technical skills needed
for development, maintenance and running of non-critical applications, thereby
making such implementations uneconomical for IA.

According to him: a DR Site is necessary for the business continuity and can
be managed at low costs, either in-house or by outsourcing, depending on
business requirements. Another benefit of going in for an in-house DR site was
that IA avoided various issues involved in outsourcing, including cultural
issues. The systems today are being managed exclusively by in-house expertise.

Jasmine Kaur in New Delhi

The Take Away




Key Issue Indian Airlines needed uninterrupted access to online
transaction-oriented systems and in the event of an unforeseen disaster-a
rapid switchover to an alternate site and switching back to the original site
after the disaster.

Solution Implemented solely using IBM’s proprietary technology: a dual site
implementation with IBM mainframes at both sites. One site runs mission-critical
airline applications, the other site runs batch and routine applications in a
production mode. In the event of a disaster, the other site takes over the
running of mission-critical applications relegating its own production apps to
minimal mode. Both sites use different programming environments and have
different Transaction and DB servers. Sites are interconnected by high-speed
optic fiber links.

The DR Cheatsheet




 DRM is not required for all processes. In an
organization, the applications needed are decided on the basis of criticality.

 Idle Disaster Recovery resources should be
used to process non-critical business applications. This results in cost
benefits and healthy RoI. The investment in IA’s DR site has been practically
nil because of processing of other applications at the DR Site.

 If the disaster site resources are to be
effectively used then the configuration of disaster site has to be slightly
higher than the main site, in order to run all the critical applications and the
non-critical ones with resource planning and/or in degraded mode.

 To optimize costs, take only a single license
copy of the mission critical applications and associated system software with
single operation at any of the sites.

 In case of non-critical applications, backups
should be taken at regular intervals.

 Cost and efforts requirement of disaster
recovery design depend on: time to recover and point to recover, which can wary
from application to application.

 Smart designing of applications and database
management tools is quintessential. Plan partial releasing of the system
resources even before the transaction is complete.

 If system resources utilization of main site
constantly show usage beyond 65% then it is time to add resources either at both
sites or switching over sites.

Leave a Reply

Your email address will not be published. Required fields are marked *