Soon after the September 11 disaster, announcements put out by affected US
companies indicated that offline sites had been activated and business
operations were continuing with only a missed step here and there. Data recovery
systems at remote sites appeared to have functioned immaculately and the
reliability of IT infrastructure had been demonstrated in the aftermath of the
disaster.
However,
the real causes for concern appeared in less publicized reports days later.
These reports spoke of dedicated teams of company employees, groping through
clouds of dust with no power and only torches to retrieve documents, media and
other office paraphernalia from offices adjacent to the World Trade Center.
While recovery of data systems appeared to have been well-planned and tested,
the recovery of business operations was not.
In fact, the WTC tragedy has helped bring a sense of importance towards what
is now called ‘business continuity planning’. This is the current evolution
from the relatively older concept of ‘disaster recovery planning’ and covers
processes put in place to maintain continuity in business operations as well as
continuity in IT services. The period immediately after a disaster is the most
traumatic and critical for employees, management and customers. A well rehearsed
and documented, step-wise return to normalcy is perhaps the only effective way
to manage that debilitating period.
|
"Disaster recovery actually means recovering ourselves and regaining our
mental state of normalcy," is how Konkan Railway managing director B
Rajaram describes the feeling of helplessness that envelops employees, the
management and even customers. However, within the country, the perspective of
most businesses is still distanced from this holistic approach. "For most
companies, recovery of business is equated with recovery of computing
facilities," points out Sunil Chandiramani, director (information advisory
services), Ernst & Young.
With such a narrow approach, the CIO ends up making the business continuity
plan instead of a joint effort involving heads of business units. In reality, a
typical continuity plan covers a range of procedures from safeguarding storage
media in fireproof vaults to identification of critical documents and office
files and even ensuring availability of mundane office stationary items like
paper clips.
Behind the scenes
So what’s the reason for the narrow perspective on planning continuity
coupled with recovery? Chandiramani points out that one of the reasons is that
few businesses in the country can really estimate their revenue loss from the
non-availability of their IT infrastructure with the prevalent mood being
"If I am down, how does it matter?"
|
"Customers in this country are also mostly resigned to these
inefficiencies, tend not to demand better and don’t seek better
alternatives," he continues. Another obstacle limiting appreciation of the
importance of continuity and recovery, stems from the fact that few businesses
sit down to thoroughly document their processes and workflow. "This mapping
activity helps a business understand areas of process redundancy and process
deficiency," points out RS Narayan, CFO at Star TV. "While recovery
has to do with technology, continuity has to do with operations and the two need
to be happily married for an effective plan to emerge," says Narayan. A
stereotypical mindset at the top management level also adds to the overall level
of indifference. "This can’t happen to us", or "Why invest so
much on an event which may never happen" are the usual counter-arguments
for shooting down disaster recovery or business continuity initiatives, says
Ajay Patil, manager (IT operations) at Birla Sun Life Insurance.
Speaking to organizations where continuity and recovery planning have been
successfully integrated with routine processes, reveals some common
facilitators. The first is the degree to which the business is dependent on
automation. If business processes are likely to grind to a halt in the possible
event of failure in technology infrastructure, then it is more than likely that
the organization will have implemented an effective continuity and recovery
plan. The second facilitator is linked to the point in the life-cycle of the
business, where an initiative to build a continuity plan has been started. If it
has been conceptualized and implemented starting from an early stage of the
business, then there is again a strong possibility that integration has been
successfully completed.
First time right
Take the case of the recently operational, quasi-government organization–Konkan
Railway. Right from the beginning, it was conceptualized that this division of
the Indian Railways would be heavily dependent on information systems, reducing
manpower overheads and creating a flat hierarchical employee structure.
"The requirement therefore, was to build an integrated information system
covering all business processes," explains Konkan Railway’s chief manager
(IT) Sharad Saxena.
|
Tata Infotech was commissioned to develop a suitable ERP application with the
complete development process. The project took five years to complete and costed
Rs 10 crore. Once implemented, Konkan Railway employees had no choice but to
rapidly familiarize themselves with the ERP application, since a complete layer
of human mass for managing the stacks of files and reams of paper, had been
eliminated. "With no subordinates to service them, employees had to rely on
the home-grown ERP right from the beginning," recalls Saxena.
The OS platform chosen for the 17-module ERP application developed by Tata
Infotech was Compaq’s True64 Unix, using an Informix RDBMS. A distributed
computing architecture has been created using Konkan Railway’s fiber optic
backbone running along the length of the tracks from Nerul in Navi Mumbai to
Mangalore in the south. The main database server located at the network centre
at Belapur, Navi Mumbai is a 4CPU, Compaq Alpha 2100 and has a disaster recovery
Compaq Alpha ES40 backup server. At the 53 railway stations along the track are
Compaq Alpha 200 servers, which can function in a standalone mode.
The home-grown ERP application covers all possible business processes in the
organization, including financial accounting, annual accounts, traffic accounts,
expenditure authorization, claims and compensation, commercial freight,
operations and train control, personnel management, electrical maintenance,
signal and telecommunication, track and structure maintenance, rolling stock,
health management, stores and inventory, security and administration.
In comparison with other divisions of the railways, Konkan Railway is unique.
As there are no human relays, there’s absolutely no possibility of a manual
recovery system in the event of an IT systems failure. Konkan Railway has,
therefore, integrated an elaborate continuity and recovery plan, envisaging many
possible disaster scenarios. In contrast, other railway divisions are
manpower-intensive and, coupled with the absence of significant automation, are
hardly prone to failure. As Saxena says, "If you don’t have automation,
you don’t have to worry about anything."
Pointers on Continuity Planning |
|
One approach for building continuity and recovery in an organization is to
invest in redundant infrastructure. But the whole organization cannot be
replicated and redundancy is always a costly affair. The other approach is to
identify points of failure and build processes to ensure continuity in these
areas. And that’s the unanimous recommendation from both Konkan Railway’s
Saxena and Star TV’s Narayan. Comments Saxena, "We have tried to
institutionalize these things in a big way," implying that for most
contingencies, recovery steps have been documented. Continues Star TV’s
Narayan, "As the organization grows, processes also need to be constantly
reviewed and this makes the whole continuity initiative affordable."
|
However, documentation by itself is not the end of the road. Every 15 days, a
disaster situation is simulated at Konkan Railway for three hours, after which
normal operations are resumed. Ernst & Young’s Chandiramani also informs
that there are ongoing drills at clients Hindustan Lever and American Express to
maintain service levels under less than ideal conditions.
Out of site
While continuity planning deals with step-wise processes to maintain
operations across the complete end-to-end business, disaster recovery is equally
important and focuses on the recovery of IT operations. This usually involves
replicating mission-critical applications and select IT infrastructure at a
remote site. However, the first obstacle in this assessment, according to Ernst
& Young’s Chandiramani, is that business managers are unable to identify
mission-critical applications.
Mumbai-based Birla Sun Life Insurance and Shoppers’ Stop are two other
real-life examples where information systems have been built alongside business
processes, right since operations commenced. In both cases, the integration is
so well-knit that a prolonged downtime of the IT infrastructure will force
normal operations to halt. Building and stabilizing disaster recovery sites
have, therefore, been on both agendas right from the beginning. In the case of
Birla Sun Life Insurance, the joint venture started operations in March 2001 and
selected Ingenium from Canadian vendor Solcorp as its key business-critical
application.
|
The hardware and OS platform combination were chosen to be IBM’s 12 CPU,
RS6000-H80 server running AIX Unix, with DB2 as the database platform. A
centralized, three-tier client-server architecture was created, with browser
based access for employees at the 30 branch offices. "The reason why the
business chose to build a centralized architecture was to ensure version control
at the client end," explains Patil.
Branch offices access the IBM RS6000 server using the Aditya Birla private
network. Connectivity to the private network is through 64-Kbps leased lines and
ISDN dialup as a back-up communication mode. The business also uses the Sun
accounting package, with an Oracle RDBMS deployed on Windows NT and Lotus Notes
for e-mail and workflow application.
The insurance company first created a back-up server within the same premises
using a 2-CPU, IBM RS6000-F80 server. After the database and application were
successfully replicated and stabilized, the backup server was shifted to a
remote site located at Chembur, on the outskirts of Mumbai. The Sun accounting
database and Lotus Notes database servers have also been replicated at the
disaster site.
The main site and the back-up site are connected through a 2-Mbps leased
line, which is also part of the Aditya Birla private network. But even with this
bandwidth, Patil admits the disaster site will not be able to function in real
time. The disaster recovery process is currently being tested under various
possible contingencies. These cover the non-availability of the primary server
and the Aditya Birla network in various on-off combinations. Arthur Anderson is
currently validating the implementation of the company’s business continuity
and recovery plans.
Points to Ponder... |
|
More to follow
"In the case of Shoppers’ Stop, which started operations in 1991, it
was evident that effective scaling of operations would be possible only after an
enterprise-wide warehouse and retail management application had been
implemented," explains Vikas Prabhu, senior manager (information systems).
From 1998 to 1999, the retail chain implemented JD Armstrong ERP on a
partly-centralized and partly-distributed two-tier client-server architecture.
The hardware and OS platform combination chosen was an IBM AS400 server, running
OS400, located at Mumbai. At retail outlets, the client part of the Ingenium
application was installed on POS terminals managed by a local IBM Netfinity
server, running Windows NT. This part of the ERP application is functioning in a
distributed mode.
With businesses in the country moving up the computing maturity ladder,
uptime of IT infrastructure is becoming more and more critical. Also, with
internal disturbances in the country, key service locations like VSNL, Indian
Railways, Reserve Bank of India, National Stock Exchange, Bombay Stock Exchange,
Air India and others are becoming increasingly exposed to the threat of damage.
In comparison with the past, these two drivers are expected to lead to an
increased investment in business continuity and disaster recovery procedures.
Nevertheless, for companies choosing to turn a blind eye, and there would
still be many of them, the cost of not investing in the future may soon catch
up.
Arun Shankar in Mumbai
The author has been executive editor of Dataquest. He continues to write on
business computing issues.