Advertisment

OLAP And Datawarehousing

author-image
DQI Bureau
New Update

Online Analytical Processing (OLAP) is a

technology that is ideally linked to datawarehousing, a pro-cess where enterprise data,

from operational systems is integrated into a single database for the purpose of analysis.

However, OLAP tools are often used independently of a datawarehouse, though it is

inconceivable to have a warehouse without a set of OLAP tools to glean business

intelligence from it. I will first briefly explain what OLAP and datawarehousing are and

then go on to the reasons why you should take a close look at one or both of the two in

the year ahead.

Advertisment

OLAP



The term OLAP was first coined by none other than Dr EF Codd, who pioneered Relational
Database Management Systems (RDBMS). In a white paper funded by Arbor Software, a leading

player in the OLAP market today and a pioneer of the technology, Dr Codd defined 12 rules

or features (later expanded to 18) that an OLAP product must support. These rules were not

unlike the rules he had earlier defined for RDBMSs.

At the heart of Dr Codd's white paper was

an acknowledgement that the best way to store data in an operational transaction

processing system was not the best way to use it as information. The stress while storing

data in a relational database is on storage efficiency, data integrity, and easy update.

On the other hand, for using data as information the stress necessarily has to be on quick

summarization along dimensions of interest to the business. A data storage schema designed

for transaction processing is simply not good enough for analytical processing! Take the

example of a sales manager who needs to understand the monthly pattern of sales, in a

particular territory, for a particular class of product, over the past five years. A

summarization and cross tabulation of such information from a relational schema would

simply take too long. An OLAP system, on the other hand, is designed to deliver answers to

most such queries in less than five seconds!

An important feature of OLAP tools is that

they allow non-technical users to perform slice and dice analysis of data. A user can

summarize data along two or more dimensions of interest to his/her business without using

any arcane computerese (like SQL commands). He can then turn over the 'cube' in any manner

that he pleases. For example, a user may first like to view a cross tabulation of sales

for a territory with product class on the X-axis and the months along the Y-axis. Next

instant he may switch to a cross tabulation of sales for a single product class with the

territories placed on the X-axis and the months places monthly sales along the Y-axis.

Advertisment

In the absence of OLAP tools such cross

tabulations have traditionally been performed with spreadsheets using summary data

extracted from operational databases using SQL. This approach has severe limitations.

Updating the summary data on a spreadsheet is a laborious and time-consuming process.

Summarized data on spreadsheets is very difficult to keep secure. It is difficult to

provide simultaneous access for many users to the same summarized data.

OLAP products in the market vary in their

approach, architecture, data handling capability, and price. At the lower end, you have

the spreadsheet pivot tables and ActiveX components such as Decision Cube and DynamiCube.

The low-end products operate without a database server and rely instead on small fact

tables extracted from SQL databases or spreadsheets. However, in terms of providing

analytical capabilities these products are surprisingly capable. If you haven't yet seen

how you can slice and dice data using a spreadsheet pivot table, it is time you had a

look.

Most high-end OLAP tools in the market

adopt a client server architecture. A database server stores data using a specialized

schema that makes summarization easy and quick. The OLAP front-end retrieves data from the

server using the server API. Arbor Software Essbase and Oracle Express are examples of

such high-end server products.

Advertisment

Hitherto, OLAP front-end products have

accessed the OLAP servers using proprietary APIs. Now a few OLAP vendors have joined

together and set up the OLAP Council in an attempt to standardize on a layer of

abstraction that will provide a uniform API for front-end OLAP tools to connect to OLAP

servers. The OLAP council released the first version of its API in late 1996 and the

second version in January 1998. Unfortunately, the OLAP council does not include in its

fold all the OLAP players, worst still its functioning does not appear to be smooth. Arbor

Software, a council member and leading OLAP player, failed to endorse the API.

With the OLAP market growing at 40 percent

per year it was just a matter of time before Microsoft would step into it. It made its

first move in 1996 with the acquisition of an OLAP server developed by Panorama Software

Systems Ltd, an Israeli company. The server product has been re-worked and is currently

being tested under the code name of Plato and is slated for release by the end of 1998.

Plato could well become a part of the BackOffice suite of products and with its release we

could see the prices of OLAP products plummet.

More significant than its entry into the

market with an OLAP server product is Microsoft's initiative for a standardized OLAP API.

An OLAP API has now been included in Microsoft's OLE DB initiative, aimed at providing a

COM component-based standard for accessing disparate data-sources such as file server

databases, SQL databases, spreadsheets, project management planners, email, and now

multidimensional databases.

Advertisment

Datawarehousing, as stated earlier, is the

process of integrating enterprise data into a single database. Most organizations with

heavy investments in IT have built their operational systems over years, even decades.

After all, building operational systems and then honing them to exactly fit your

requirement takes time. By the time you finish building one operational system and start

on the other it is usual for the technology to undergo a change. Sticking to the earlier

technology for the new system then makes little sense. As a result, it is not unusual to

come across an organization that has an old billing application on a legacy platform, a

later marketing application on an Ingres/Unix platform, and a more-recent customer care

application an Oracle/NT platform.

In between developing these high-data

volume applications the organization could well have invested in a payroll and personnel

system on a dBase/DOS platform, a Tender and Contract application on Access/Windows, and

several other departmental applications similarly built using file server database systems

and even spreadsheets.

As a result of the evolutionary growth in

IT technology and the longtime required to develop and perfect operational systems,

organizational data tends to get locked up within disparate systems. I have come across a

telecom company that mails marketing material to subscribers who are habitual defaulters

when it comes to paying their bills. This happens because the marketing application works

completely independent of the billing application! The two systems, do not 'talk' to each

other. What most organizations tend to end up with are islands of information.

There is obviously a need to integrate

enterprise data into a single database system in order to facilitate an overall view of

the data. Obviously, rebuilding existing systems from scratch is not a viable alternative.

Not only would it go against the earthy wisdom of not fixing something that is not broke,

but it simply would not work. The logical solution is to leave the working operational

systems as they are, but periodically move their data into a central datawarehouse. This

process would have an additional spin off. Earlier, in our discussion on OLAP, we saw that

OLAP tools require a specialized data storage schema. The warehouse can be built using

such a schema. As a result, with a warehouse you not only achieve integration of data, you

also get to reformat it so that you can analyze and 'mine' it for information and

patterns.

As the Indian economy opens up competition

is going to hot up. Only the 'fittest' amongst the organizations are going to survive.

Unless organizations can unlock the information and intelligence in the data that they

have amassed, they are going to be hard put in facing up to the competition. It is in this

context that corporate IT needs to look at warehousing and OLAP during the year to come.

Advertisment