Features

Bullet-proof your Business

DQI Bureau

03 Apr 2002 00:00 IST

New Update

Soon after the September 11 disaster, announcements put out by affected US

companies indicated that offline sites had been activated and business

operations were continuing with only a missed step here and there. Data recovery

systems at remote sites appeared to have functioned immaculately and the

reliability of IT infrastructure had been demonstrated in the aftermath of the

disaster.

Advertisment

However,

the real causes for concern appeared in less publicized reports days later.

These reports spoke of dedicated teams of company employees, groping through

clouds of dust with no power and only torches to retrieve documents, media and

other office paraphernalia from offices adjacent to the World Trade Center.

While recovery of data systems appeared to have been well-planned and tested,

the recovery of business operations was not.

In fact, the WTC tragedy has helped bring a sense of importance towards what

is now called ‘business continuity planning’. This is the current evolution

from the relatively older concept of ‘disaster recovery planning’ and covers

processes put in place to maintain continuity in business operations as well as

continuity in IT services. The period immediately after a disaster is the most

traumatic and critical for employees, management and customers. A well rehearsed

and documented, step-wise return to normalcy is perhaps the only effective way

to manage that debilitating period.

“The 17-module, home-grown ERP has eliminated the mass of human relays at Konkan Railways. Human relays, however, continue to exist in the rest of Indian Railways”

Sharad Saxena, chief manager (IT), Konkan Railways

Advertisment

"Disaster recovery actually means recovering ourselves and regaining our

mental state of normalcy," is how Konkan Railway managing director B

Rajaram describes the feeling of helplessness that envelops employees, the

management and even customers. However, within the country, the perspective of

most businesses is still distanced from this holistic approach. "For most

companies, recovery of business is equated with recovery of computing

facilities," points out Sunil Chandiramani, director (information advisory

services), Ernst & Young.

With such a narrow approach, the CIO ends up making the business continuity

plan instead of a joint effort involving heads of business units. In reality, a

typical continuity plan covers a range of procedures from safeguarding storage

media in fireproof vaults to identification of critical documents and office

files and even ensuring availability of mundane office stationary items like

paper clips.

Behind the scenes

So what’s the reason for the narrow perspective on planning continuity

coupled with recovery? Chandiramani points out that one of the reasons is that

few businesses in the country can really estimate their revenue loss from the

non-availability of their IT infrastructure with the prevalent mood being

"If I am down, how does it matter?"

Advertisment

“In this country, business continuity is usually equated with disaster recovery, with cost always being the inhibiting factor”

Sunil Chandiramani director (information advisory services),

Ernst & Young

"Customers in this country are also mostly resigned to these

inefficiencies, tend not to demand better and don’t seek better

alternatives," he continues. Another obstacle limiting appreciation of the

importance of continuity and recovery, stems from the fact that few businesses

sit down to thoroughly document their processes and workflow. "This mapping

activity helps a business understand areas of process redundancy and process

deficiency," points out RS Narayan, CFO at Star TV. "While recovery

has to do with technology, continuity has to do with operations and the two need

to be happily married for an effective plan to emerge," says Narayan. A

stereotypical mindset at the top management level also adds to the overall level

of indifference. "This can’t happen to us", or "Why invest so

much on an event which may never happen" are the usual counter-arguments

for shooting down disaster recovery or business continuity initiatives, says

Ajay Patil, manager (IT operations) at Birla Sun Life Insurance.

Speaking to organizations where continuity and recovery planning have been

successfully integrated with routine processes, reveals some common

facilitators. The first is the degree to which the business is dependent on

automation. If business processes are likely to grind to a halt in the possible

event of failure in technology infrastructure, then it is more than likely that

the organization will have implemented an effective continuity and recovery

plan. The second facilitator is linked to the point in the life-cycle of the

business, where an initiative to build a continuity plan has been started. If it

has been conceptualized and implemented starting from an early stage of the

business, then there is again a strong possibility that integration has been

successfully completed.

Advertisment

First time right

Take the case of the recently operational, quasi-government organization–Konkan

Railway. Right from the beginning, it was conceptualized that this division of

the Indian Railways would be heavily dependent on information systems, reducing

manpower overheads and creating a flat hierarchical employee structure.

"The requirement therefore, was to build an integrated information system

covering all business processes," explains Konkan Railway’s chief manager

(IT) Sharad Saxena.

“Right from day one of our operations, a disaster recovery site was planned.

The site is now operational at Chembur, on the outskirts of Mumbai”

Ajay Patil, manager (IT operations)

Birla Sun Life Insurance

Tata Infotech was commissioned to develop a suitable ERP application with the

complete development process. The project took five years to complete and costed

Rs 10 crore. Once implemented, Konkan Railway employees had no choice but to

rapidly familiarize themselves with the ERP application, since a complete layer

of human mass for managing the stacks of files and reams of paper, had been

eliminated. "With no subordinates to service them, employees had to rely on

the home-grown ERP right from the beginning," recalls Saxena.

Advertisment

The OS platform chosen for the 17-module ERP application developed by Tata

Infotech was Compaq’s True64 Unix, using an Informix RDBMS. A distributed

computing architecture has been created using Konkan Railway’s fiber optic

backbone running along the length of the tracks from Nerul in Navi Mumbai to

Mangalore in the south. The main database server located at the network centre

at Belapur, Navi Mumbai is a 4CPU, Compaq Alpha 2100 and has a disaster recovery

Compaq Alpha ES40 backup server. At the 53 railway stations along the track are

Compaq Alpha 200 servers, which can function in a standalone mode.

The home-grown ERP application covers all possible business processes in the

organization, including financial accounting, annual accounts, traffic accounts,

expenditure authorization, claims and compensation, commercial freight,

operations and train control, personnel management, electrical maintenance,

signal and telecommunication, track and structure maintenance, rolling stock,

health management, stores and inventory, security and administration.

In comparison with other divisions of the railways, Konkan Railway is unique.

As there are no human relays, there’s absolutely no possibility of a manual

recovery system in the event of an IT systems failure. Konkan Railway has,

therefore, integrated an elaborate continuity and recovery plan, envisaging many

possible disaster scenarios. In contrast, other railway divisions are

manpower-intensive and, coupled with the absence of significant automation, are

hardly prone to failure. As Saxena says, "If you don’t have automation,

you don’t have to worry about anything."

Advertisment

Pointers on Continuity Planning

Business continuity is proactive and meant to keep the business running. Disaster recovery is reactive and meant to ensure recovery from damage to the IT infrastructure
Business units need to prioritize critical functions. Limited resources will be used to recover critical operations on a priority basis
Costs of continuity processes soar as the window of recovery time is shortened. Businesses, therefore, need to reach a trade-off between cost of continuity processes and loss suffered due to prolonged suspension of operations
Since computers or web sites may not be accessible, hard copies of the documentation need to be available
It is easier to assign tasks during development of a continuity plan rather than at the time of a full-blown event
Tasks should be assigned by team or title, not to individuals
Plans need to be exercised, not tested. Exercises lead to improvement through practice. Tests only create scorecards
Keep simulating shut-downs till you are satisfied with the time and accuracy of execution
Link changes in business processes to changes in continuity planning

One approach for building continuity and recovery in an organization is to

invest in redundant infrastructure. But the whole organization cannot be

replicated and redundancy is always a costly affair. The other approach is to

identify points of failure and build processes to ensure continuity in these

areas. And that’s the unanimous recommendation from both Konkan Railway’s

Saxena and Star TV’s Narayan. Comments Saxena, "We have tried to

institutionalize these things in a big way," implying that for most

contingencies, recovery steps have been documented. Continues Star TV’s

Narayan, "As the organization grows, processes also need to be constantly

reviewed and this makes the whole continuity initiative affordable."

“Reviewing business processes periodically is an effective way to identify points of failure within any given organization”

RS Narayan

Star TV CFO

Advertisment

However, documentation by itself is not the end of the road. Every 15 days, a

disaster situation is simulated at Konkan Railway for three hours, after which

normal operations are resumed. Ernst & Young’s Chandiramani also informs

that there are ongoing drills at clients Hindustan Lever and American Express to

maintain service levels under less than ideal conditions.

Out of site

While continuity planning deals with step-wise processes to maintain

operations across the complete end-to-end business, disaster recovery is equally

important and focuses on the recovery of IT operations. This usually involves

replicating mission-critical applications and select IT infrastructure at a

remote site. However, the first obstacle in this assessment, according to Ernst

& Young’s Chandiramani, is that business managers are unable to identify

mission-critical applications.

Mumbai-based Birla Sun Life Insurance and Shoppers’ Stop are two other

real-life examples where information systems have been built alongside business

processes, right since operations commenced. In both cases, the integration is

so well-knit that a prolonged downtime of the IT infrastructure will force

normal operations to halt. Building and stabilizing disaster recovery sites

have, therefore, been on both agendas right from the beginning. In the case of

Birla Sun Life Insurance, the joint venture started operations in March 2001 and

selected Ingenium from Canadian vendor Solcorp as its key business-critical

application.

“Operations can continue for 24 hours after non-availability of the server. A disaster recovery site is, therefore, non-negotiable”

Vikas Prabhu

senior manager (IS), Shoppers’ Stop

The hardware and OS platform combination were chosen to be IBM’s 12 CPU,

RS6000-H80 server running AIX Unix, with DB2 as the database platform. A

centralized, three-tier client-server architecture was created, with browser

based access for employees at the 30 branch offices. "The reason why the

business chose to build a centralized architecture was to ensure version control

at the client end," explains Patil.

Branch offices access the IBM RS6000 server using the Aditya Birla private

network. Connectivity to the private network is through 64-Kbps leased lines and

ISDN dialup as a back-up communication mode. The business also uses the Sun

accounting package, with an Oracle RDBMS deployed on Windows NT and Lotus Notes

for e-mail and workflow application.

The insurance company first created a back-up server within the same premises

using a 2-CPU, IBM RS6000-F80 server. After the database and application were

successfully replicated and stabilized, the backup server was shifted to a

remote site located at Chembur, on the outskirts of Mumbai. The Sun accounting

database and Lotus Notes database servers have also been replicated at the

disaster site.

The main site and the back-up site are connected through a 2-Mbps leased

line, which is also part of the Aditya Birla private network. But even with this

bandwidth, Patil admits the disaster site will not be able to function in real

time. The disaster recovery process is currently being tested under various

possible contingencies. These cover the non-availability of the primary server

and the Aditya Birla network in various on-off combinations. Arthur Anderson is

currently validating the implementation of the company’s business continuity

and recovery plans.

Points

to Ponder...

The

period immediately after a disaster is most traumatic and critical for

employees, the management and customers. A well rehearsed and

documented, step-wise return to normalcy is perhaps the only effective

way to manage that debilitating period
One

approach towards building continuity in an organization is to invest

in redundant infrastructure. But the whole organization cannot be

replicated and redundancy is always a costly affair
Documentation

by itself is not the end of the road. Every 15 days, a disaster

situation is simulated at Konkan Railway for three hours after which

normal operations are resumed

More to follow

"In the case of Shoppers’ Stop, which started operations in 1991, it

was evident that effective scaling of operations would be possible only after an

enterprise-wide warehouse and retail management application had been

implemented," explains Vikas Prabhu, senior manager (information systems).

From 1998 to 1999, the retail chain implemented JD Armstrong ERP on a

partly-centralized and partly-distributed two-tier client-server architecture.

The hardware and OS platform combination chosen was an IBM AS400 server, running

OS400, located at Mumbai. At retail outlets, the client part of the Ingenium

application was installed on POS terminals managed by a local IBM Netfinity

server, running Windows NT. This part of the ERP application is functioning in a

distributed mode.

With businesses in the country moving up the computing maturity ladder,

uptime of IT infrastructure is becoming more and more critical. Also, with

internal disturbances in the country, key service locations like VSNL, Indian

Railways, Reserve Bank of India, National Stock Exchange, Bombay Stock Exchange,

Air India and others are becoming increasingly exposed to the threat of damage.

In comparison with the past, these two drivers are expected to lead to an

increased investment in business continuity and disaster recovery procedures.

Nevertheless, for companies choosing to turn a blind eye, and there would

still be many of them, the cost of not investing in the future may soon catch

up.

Arun Shankar in Mumbai

The author has been executive editor of Dataquest. He continues to write on
business computing issues.

Advertisment