Online Analytical Processing (OLAP) is a
technology that is ideally linked to datawarehousing, a pro-cess where enterprise data,
from operational systems is integrated into a single database for the purpose of analysis.
However, OLAP tools are often used independently of a datawarehouse, though it is
inconceivable to have a warehouse without a set of OLAP tools to glean business
intelligence from it. I will first briefly explain what OLAP and datawarehousing are and
then go on to the reasons why you should take a close look at one or both of the two in
the year ahead.
OLAP
The term OLAP was first coined by none other than Dr EF Codd, who pioneered Relational
Database Management Systems (RDBMS). In a white paper funded by Arbor Software, a leading
player in the OLAP market today and a pioneer of the technology, Dr Codd defined 12 rules
or features (later expanded to 18) that an OLAP product must support. These rules were not
unlike the rules he had earlier defined for RDBMSs.
At the heart of Dr Codd's white paper was
an acknowledgement that the best way to store data in an operational transaction
processing system was not the best way to use it as information. The stress while storing
data in a relational database is on storage efficiency, data integrity, and easy update.
On the other hand, for using data as information the stress necessarily has to be on quick
summarization along dimensions of interest to the business. A data storage schema designed
for transaction processing is simply not good enough for analytical processing! Take the
example of a sales manager who needs to understand the monthly pattern of sales, in a
particular territory, for a particular class of product, over the past five years. A
summarization and cross tabulation of such information from a relational schema would
simply take too long. An OLAP system, on the other hand, is designed to deliver answers to
most such queries in less than five seconds!
An important feature of OLAP tools is that
they allow non-technical users to perform slice and dice analysis of data. A user can
summarize data along two or more dimensions of interest to his/her business without using
any arcane computerese (like SQL commands). He can then turn over the 'cube' in any manner
that he pleases. For example, a user may first like to view a cross tabulation of sales
for a territory with product class on the X-axis and the months along the Y-axis. Next
instant he may switch to a cross tabulation of sales for a single product class with the
territories placed on the X-axis and the months places monthly sales along the Y-axis.
In the absence of OLAP tools such cross
tabulations have traditionally been performed with spreadsheets using summary data
extracted from operational databases using SQL. This approach has severe limitations.
Updating the summary data on a spreadsheet is a laborious and time-consuming process.
Summarized data on spreadsheets is very difficult to keep secure. It is difficult to
provide simultaneous access for many users to the same summarized data.
OLAP products in the market vary in their
approach, architecture, data handling capability, and price. At the lower end, you have
the spreadsheet pivot tables and ActiveX components such as Decision Cube and DynamiCube.
The low-end products operate without a database server and rely instead on small fact
tables extracted from SQL databases or spreadsheets. However, in terms of providing
analytical capabilities these products are surprisingly capable. If you haven't yet seen
how you can slice and dice data using a spreadsheet pivot table, it is time you had a
look.
Most high-end OLAP tools in the market
adopt a client server architecture. A database server stores data using a specialized
schema that makes summarization easy and quick. The OLAP front-end retrieves data from the
server using the server API. Arbor Software Essbase and Oracle Express are examples of
such high-end server products.
Hitherto, OLAP front-end products have
accessed the OLAP servers using proprietary APIs. Now a few OLAP vendors have joined
together and set up the OLAP Council in an attempt to standardize on a layer of
abstraction that will provide a uniform API for front-end OLAP tools to connect to OLAP
servers. The OLAP council released the first version of its API in late 1996 and the
second version in January 1998. Unfortunately, the OLAP council does not include in its
fold all the OLAP players, worst still its functioning does not appear to be smooth. Arbor
Software, a council member and leading OLAP player, failed to endorse the API.
With the OLAP market growing at 40 percent
per year it was just a matter of time before Microsoft would step into it. It made its
first move in 1996 with the acquisition of an OLAP server developed by Panorama Software
Systems Ltd, an Israeli company. The server product has been re-worked and is currently
being tested under the code name of Plato and is slated for release by the end of 1998.
Plato could well become a part of the BackOffice suite of products and with its release we
could see the prices of OLAP products plummet.
More significant than its entry into the
market with an OLAP server product is Microsoft's initiative for a standardized OLAP API.
An OLAP API has now been included in Microsoft's OLE DB initiative, aimed at providing a
COM component-based standard for accessing disparate data-sources such as file server
databases, SQL databases, spreadsheets, project management planners, email, and now
multidimensional databases.
Datawarehousing, as stated earlier, is the
process of integrating enterprise data into a single database. Most organizations with
heavy investments in IT have built their operational systems over years, even decades.
After all, building operational systems and then honing them to exactly fit your
requirement takes time. By the time you finish building one operational system and start
on the other it is usual for the technology to undergo a change. Sticking to the earlier
technology for the new system then makes little sense. As a result, it is not unusual to
come across an organization that has an old billing application on a legacy platform, a
later marketing application on an Ingres/Unix platform, and a more-recent customer care
application an Oracle/NT platform.
In between developing these high-data
volume applications the organization could well have invested in a payroll and personnel
system on a dBase/DOS platform, a Tender and Contract application on Access/Windows, and
several other departmental applications similarly built using file server database systems
and even spreadsheets.
As a result of the evolutionary growth in
IT technology and the longtime required to develop and perfect operational systems,
organizational data tends to get locked up within disparate systems. I have come across a
telecom company that mails marketing material to subscribers who are habitual defaulters
when it comes to paying their bills. This happens because the marketing application works
completely independent of the billing application! The two systems, do not 'talk' to each
other. What most organizations tend to end up with are islands of information.
There is obviously a need to integrate
enterprise data into a single database system in order to facilitate an overall view of
the data. Obviously, rebuilding existing systems from scratch is not a viable alternative.
Not only would it go against the earthy wisdom of not fixing something that is not broke,
but it simply would not work. The logical solution is to leave the working operational
systems as they are, but periodically move their data into a central datawarehouse. This
process would have an additional spin off. Earlier, in our discussion on OLAP, we saw that
OLAP tools require a specialized data storage schema. The warehouse can be built using
such a schema. As a result, with a warehouse you not only achieve integration of data, you
also get to reformat it so that you can analyze and 'mine' it for information and
patterns.
As the Indian economy opens up competition
is going to hot up. Only the 'fittest' amongst the organizations are going to survive.
Unless organizations can unlock the information and intelligence in the data that they
have amassed, they are going to be hard put in facing up to the competition. It is in this
context that corporate IT needs to look at warehousing and OLAP during the year to come.