Patterns From The Petabytes

Internet-age
apps have pushed gigabit backbones to breaking-point. Project Oxygen
will criss-cross the oceans with terabit fiber, a base for Internet
2. Lucent has a cable with 432 terabit fibers-several times the
internet’s bandwidth. But our apps will push these frontiers.
Data users now talk tera-1012 bytes. A thousandfold up, we get to
peta, 1015 bytes, and then exa, 1018 bytes.

My company works
with about 1 terabyte of software and data. IBM manages 176TB on
its internal network, and 576TB for commercial accounts. The internet
has 1,000 publicly accessible terabytes. There’s lots more (about
1,000 petabytes) on other networks, and 20 times that (20 exabytes)
offline. IBM estimates another 200-odd EB in analog form on our
planet.

It’s a challenge
to store and carry this data, pushing aerial density and fiber to
limits. A bigger challenge is to make sense of this data.

Amex gets 10,000
responses from a million mailers. That’s 990,000 junk mailers. Now,
if only they knew who would respond… The ten million rupees saved
would have made the offer competitive. Right now, customers now
pay more for a service because of the majority who don’t respond.

Can Amex predict
respondents? That’s the problem a new breed grapples with: the data
miners. They’re looking for patterns in petabytes of data. Perhaps
those who own Marutis, travel six times a year, and own three appliances
are very likely to buy the Amex card. Perhaps two-wheeler owners
with one TV can be excluded…

Data mining
at Wal-Mart found a link between cosmetics and greeting cards. Applying
this, they pushed up sales in both categories by 30%.

Another store
almost dropped Feta, a low margin cheese for a small niche. Until
it found that its buyers also bought high-margin Swiss chocolates.
Enhancing the Feta choice helped increase chocolate sales, and profits.

Data mining-digging
up patterns in terabytes of data, and predicting future behavior-will
radically change our understanding of consumer behavior beyond Y2k,
and of so many areas that generate data.

Consider our
Election Commission’s 600 million records. If the right ten demographic
parameters were included next time round, mining could evolve this
into a staggering marketing database.

Data mining
is growing up from jargon to killer app, one that will go mainstream
by 2001. Making those staggering numbers make sense.

pkr@cmil.com

Leave a Reply

Your email address will not be published. Required fields are marked *