Here, Cray and IBM argue on why their choice of technology is better than the
other’s. This is hot stuff, happening real-time...
When Cray announced the launch of its X1 supercomputer last October and
predicted a renaissance of specialized supercomputers, the supercomputing
community sat up and listened. The reason–just about six months ago, NEC
(Japan) had fired the fastest supercomputer in the world, which used the vector
architecture. An architecture which leading vendors like IBM had dismissed as
"old technology." With the launch of the X1, the debate has started
off all over again.
For nearly four decades, supercomputing was synonymous with supercomputers–machines
large enough to occupy six cricket pitches and were typically used to crack ‘grand
challenge’ problems like weather and climate modeling, high-fidelity crash
testing of automobiles, advanced drug and design, and discovery simulation.
Cray, NEC, and Hitachi made such machines that were based on vector
architectures. In the last decade or so however, the market for such machines
has been shrinking rapidly. True to the name, supercomputers crunched voluminous
amounts of data but were very costly. Hence they adorned government-run labs or
very large organizations.
That changed in the early 90s with the emergence of a supercomputing
technology called clustering.
|
The cluster story
Basically, instead of a single machine with supercomputing capabilities, a
cluster is a group of machines or processors that are linked together and can
offer comparable computing power. With the increasing processing power of
commodity microprocessors (thanks to Moore’s Law), the very need for
specialized supercomputers was questioned. After all, the computation power of
any laptop available today was the same as that of a specialized supercomputer
25 years ago.
Also, the idea of using low-end processors to achieve high performance
computing appealed to corporate users who were looking for affordable
supercomputing solutions. So organizations like Sun and Dell that have been
associated with personal computing have entered the field of High Performance
Computing (HiPC).
"There are two major advantages of cluster based supercomputing. The
first is scalability. You can start at very small systems (a few processors) and
scale up to the largest systems in the world (tens of thousands of processors).
This allows companies with the smallest budgets or highest performance
requirements to use the same technology to meet their needs. It allows
developers to work on small systems and scale their applications to grand
challenge proportions on the largest systems available," says Peter Ungaro,
IBM Vice President for Sales, Worldwide High Performance Computing. IBM has
largely placed its bets on cluster based high performance computing in the
recent past.
"The second advantage is that they will get the most sustained
performance for their budget. This is important as organizations worldwide want
to get the highest return on their investments in supercomputing
technologies," he adds.
The Cray argument
But for Cray, such arguments are trivial.
"Organizations like IBM contend that cluster systems are cheaper than
custom systems like those made by Cray and Hitachi.
But if you look at issues like facility costs, power consumption and support
costs, clusters are pretty costly considering the relative level of (low)
performance that you get from them," says Dr Burton Smith, chief scientist
Cray Incorporated who was in India recently.
For Dr Burton, the supercomputing world is divided into two halves–the
manufacturers of "Type C" machines like Cray, NEC, Hitachi and those
who make "Type T" machines like IBM, SGI etc.
"Basically, there are two different types of supercomputers–the
clusters and grid systems called Type T machines, whose prices are based on
transistor costs and their peak performance is characterized by LINPACK (a
benchmark to measure performance of a dedicated system for solving a dense
system of linear equations). On the other hand, high memory bandwidth and fast
interconnection switches characterize the performance of Type C systems (NEC,
Cray, and Hitachi).
Hence, the cost in Type C systems is in wires (connections) not in
processors," says Dr Burton.
He also contends that both these supercomputers have their respective niche
in the supercomputing ecosystem.
"Type T systems like some of those made by IBM, perform well with local
data, well-balanced workload, and explicit methods. Type C systems, on the other
hand, perform well with global access of data, poorly balanced workloads, sparse
linear algebra, implicit methods, and adaptive or irregular meshes," he
adds.
Simply put, whereas the Type T systems are good at solving relatively simpler
problems like handling large e-commerce transactions, they are simply not good
enough to handle highly complex tasks in volatile environments. Weather
prediction for instance, is a volatile environment.
" To suggest that Type C cluster made machines can undertake grand
challenges like weather modeling is bunkum and sheer marketing fluff of large
corporations like IBM," says Dr Burton.
Buoyed by the success in high performance computing areas like weather
modeling, Big Blue is unfazed by such criticism.
The IBM counter-point
"We see that the HiPC market is primarily focused on purchasing systems
which have the best sustained performance for their money and that has typically
been clusters based on high performance processors, such as the POWER4. Even in
markets traditionally dominated by specialized vector supercomputers, like
weather forecasting, we see them moving to high performance integrated clusters
such as those sold by number of IT companies including IBM. An example would be
the European Center for Medium Range Weather Forecasting (ECMWF)," says
Ungaro.
Ungaro points out that IBM’s deal with ECMWF to build the world’s most
powerful supercomputer and storage network for weather prediction is proof
enough that high performance integration clusters are not far behind.
"Also, it is critical that you don’t just have good hardware, but you
have a pervasive solution that attracts the thousands of important software
developers. You have to make sure that there is a large portfolio of
applications to run so that customers have a choice of what solution is best for
them. Also one needs to have a group of experts who take these ported
applications and optimize them. This broad application portfolio is a major
advantage of using high performance cluster technologies over specialized, niche
technologies such as vectors," says Ungaro.
Renaissance?
However, a recent school of thought has emerged which contends that a
"renaissance of specialized supercomputers" is likely.
The launch of the Earth Simulator by NEC, Japan is something manufacturers of
Vector based systems are proud of. The Earth Simulator, which is based on Vector
architecture, is five times faster than the most powerful US configuration and
the most powerful supercomputer in the world today. Cray believes this is a
"renaissance of high bandwidth vector based systems."
"The Earth Simulator is a slap on the face of all those who claimed that
specialized supercomputers were a thing of the past.
It’s a major embarrassment for the authorities and vendors in the US who
can’t believe that the fastest supercomputer is now in Japan and is a vector
based system. This is bound to rekindle a renaissance of specialized
supercomputers," says Burton.
Whether such a revival might happen or not is something that only time can
tell. But one thing is very clear–it’s going to be an uphill battle for
vector-based supercomputer makers like Cray to take on the sheer muscle power of
large corporations like IBM and make a significant dent in the supercomputing
market.