Advertisment

Indian Airlines: Working Toward Disaster Recovery

author-image
DQI Bureau
New Update

Airline disasters are ever so frightful. Even natural causes like the

infamous 'Delhi fog' or human acts like flash strikes announced by the ATC

(Air Traffic Control) could prove haranguing to passengers. One of the largest

airline systems in Asia with a fleet of 62 aircrafts, Indian Airlines (IA) is no

exception to these 'disasters'.

Advertisment

IA has been prudent enough to implement and manage a disaster recovery

programme way back in March 1998. IA could perhaps be the first commercial

organization in India to have implemented disaster recovery on mainframe systems

for mission critical applications. AK Rastogi, Director (IT), Indian Airlines,

found it important to adopt a good secondary site to tackle concerns, which

included: isolation due to power, telecommunication cables, hardware failures,

access to the premises, bombing, terrorist attacks etc.

AK Rastogi, Director (IT), Indian Airlines

He says, "IA had decided a decade back to look for an operative plan to

make its data secure and make sure work goes on, no matter what the contingency.

A very rigorous planning led to the development of another backup site, which

was made scalable for IA's future needs."

Advertisment

The configuration of the systems was planned with IBM, after a rigorous set

of investigations and walk-throughs. Since the DR site was part of the

production site for non-critical applications, IA avoided many complications,

which could have dawned from an outsourcing implementation.

DR on board



At IA, there are two sites: the production site (Site A) at the main airport

building and a backup site (Site B), housed at the IT headquarters half a km

from the production site. There are 2500 workstations connected to site A, along

with others, which are connected virtually (booking through the net and CRS).

Site A handles Passenger Services applications critical to IA's business

(Passenger Services System: reservation, ticketing, departure control etc),

while systems at site B are used for various batch and online applications

(Personnel Information System: online aircrafts spares information

system/inventory system, frequent flier system, management information system

and payroll etc)-and, most importantly, as reliable backup if Production Site

A was to fail.

Advertisment

Site A uses an ALCS/TPFDF (airline control system/transaction processing

facility, database facility) environment while Site B uses IBM CICS/ DB2 (which

is a customer transaction interface called customer information and control

system).

There are two IBM ES9000, 9672 R21, mainframe systems with dual CPUs located

at the two locations. Both locations are connected to all its offices in India

and abroad through dedicated communication lines, with the two sites being

interconnected through high speed ESCON (IBM System Channel to Channel

Connection) fiber optic links.

Not a pie in the sky



The IA environment today is potent enough to handle a disaster by an

immediate shift of operations from one site to another, within 25 minutes, with

alternate paths in place in case of any network failures. This makes sure that

passengers and the 18,500 strong IA staff, do not suffer from the setback of

system failure. Twice, in 1999 and 2003, there have been instances when weather

inconsistencies resulted in operational problems at the production site, and the

production systems were shifted promptly to the DR Site with less than 30

minutes of switchover time, without loss of any data.

Advertisment

IA's DR center has been built safe from most calamities, leaving only the

unexpected nature of earthquakes as a strong vulnerability. IA has its reasons

for its choice of site. A location at a non-seismic zone would invariably be a

faraway back-up site, depriving this dynamic site from technical skills needed

for development, maintenance and running of non-critical applications, thereby

making such implementations uneconomical for IA.

According to him: a DR Site is necessary for the business continuity and can

be managed at low costs, either in-house or by outsourcing, depending on

business requirements. Another benefit of going in for an in-house DR site was

that IA avoided various issues involved in outsourcing, including cultural

issues. The systems today are being managed exclusively by in-house expertise.

Jasmine Kaur in New Delhi

Advertisment

The Take Away

Key Issue Indian Airlines needed uninterrupted access to online

transaction-oriented systems and in the event of an unforeseen disaster-a

rapid switchover to an alternate site and switching back to the original site

after the disaster.

Advertisment

Solution Implemented solely using IBM's proprietary technology: a dual site

implementation with IBM mainframes at both sites. One site runs mission-critical

airline applications, the other site runs batch and routine applications in a

production mode. In the event of a disaster, the other site takes over the

running of mission-critical applications relegating its own production apps to

minimal mode. Both sites use different programming environments and have

different Transaction and DB servers. Sites are interconnected by high-speed

optic fiber links.

The DR Cheatsheet

Advertisment

- DRM is not required for all processes. In an

organization, the applications needed are decided on the basis of criticality.

- Idle Disaster Recovery resources should be

used to process non-critical business applications. This results in cost

benefits and healthy RoI. The investment in IA's DR site has been practically

nil because of processing of other applications at the DR Site.

- If the disaster site resources are to be

effectively used then the configuration of disaster site has to be slightly

higher than the main site, in order to run all the critical applications and the

non-critical ones with resource planning and/or in degraded mode.

- To optimize costs, take only a single license

copy of the mission critical applications and associated system software with

single operation at any of the sites.

- In case of non-critical applications, backups

should be taken at regular intervals.

- Cost and efforts requirement of disaster

recovery design depend on: time to recover and point to recover, which can wary

from application to application.

- Smart designing of applications and database

management tools is quintessential. Plan partial releasing of the system

resources even before the transaction is complete.

- If system resources utilization of main site

constantly show usage beyond 65% then it is time to add resources either at both

sites or switching over sites.

Advertisment