It’s happened to the IT industry before. A wave festooned with the
trimmings of ‘big, happening, the thing of the future’ sweeps it off it’s
feet, engulfing in the process the small and big companies in its expanse. And
the jargon that it brings in its wake triggers a new rush to clamber on to the
new segment, before the initial hoopla dies out. The cinders of the software
services era of the early 90s are still glowing. The hot ash from the dot-com
furnace is a bitter reminder. Banking, insurance and telecom are still crackling
That the biotechnology furnace is hot is an old story. Indian IT companies
have already been rolling toward it in right ernest. And the wannabes–other IT
companies and professionals, hovering at the fringe, are just waiting to sneak
in. But scan the reams of information on biotechnology and you find proteins,
cells and DNA, which you thought you’d left behind for good in Class X and
XII. So, where’s the IT in BT?
To begin with, we need to differentiate between biotechnology and
bioinformatics. Biotechnology employs molecular biology and genetics to create
improved agricultural products, food, animal feed, industrial materials and
medicines. It deals with the experimental techniques and instrumentation in
biology. Bioinformatics, on the other hand, is about IT solutions to biological
problems and applying computer technology to the management of biological
information. So it is in bioinformatics in which IT has a role, not in
biotechnology. As Zensar global CEO Ganesh Natarajan says, “Companies need
to stay out of biotech and focus on bioinformatics. Getting a few people trained
in biotechnology alone will not create another revolution.”
C++, Java, OOPS, J2EE4
Algorithms Mathematical models, Probability theory
tools: HTML, ASP, JSP,4 JDBC, Swing
software: GCG BLAST RasMol
Given the current hype surrounding biotechnology, and the eagerness displayed
by IT companies in this field, comparisons with the Internet wave are but
natural. Bioinformatics, however, is a different ballgame. To begin with, there
are no quick returns to be had. Not only do companies need huge investments,
patience is also an essential prerequisite. And unlike quick-fix diplomas in Web
designing and ‘A-to-Z of Java’ courses that powered many a dot-com, an IT
company would need to invest in at least six months of intensive training to get
computational biology through to its people. Here again, the program needs to be
designed and taught by experts. And given that bioinformatics cannot be tackled
by programming power alone, there’s a need to have life science experts with
strong domain expertise on board. After the training comes the designing of the
‘going-to-market’ strategy. IT companies need to figure out exactly where
they are headed–offering contract services, creating original intellectual
property in the form of algorithms and technologies, dabbling in database
services and product development, or conducting joint research in the drug
“IT companies should be able to set up the necessary infrastructure with
high computing power. They should also be willing to diversify or partner with
someone to address the wet (actual experiments) side of biology. This is not a
field where things can be done in isolation without the active involvement of
the end-user,” points out Patni Computer general manager Dilip
Five-fold jump in five years
“According to a CII report, the Indian biotechnology market was valued
at $2.5 billion in 2001, a five-fold increase since 1997. It is expected to
reach $10 billion by 2010. Against this, the biopharma market worldwide is
estimated at about $17 billion,” says Tata Consultancy Services executive
vice-president (advanced technology) M Vidyasagar.
Compare this with a market size of $380 billion (conventional pharmaceutical
industry) and the nearly $800-billion IT industry. Even India’s IT industry
(both domestic and exports) of about $12 billion is more than half of the
worldwide biopharma market. Given the small market size, adds Dr Vidyasagar,
“there’s too much hype about bioinformatics” at present.
“The global bioinformatics opportunity is expected to be over $8 billion
by 2008 and Indian companies, which understand the nuances of areas like data
analysis for genomics and proteinomics, can capture a share of this pie,”
says Zensar’s Natarajan.
Numbers apart, where is it that IT contributes to biotechnology?
Think of the times when you could get away with being politically incorrect.
Conjure up an image of this absent-minded research genius who works tirelessly
but never knows what he keeps where. Here is this smart, efficient young lady
who, out of the mess that his lab is, brandishes just the right equation,
algorithm, sequence, at the very instant he needs it. And as is the case with
geniuses, he carelessly chucks that all-important slip of paper into the mayhem
after he’s done with it, basking in the knowledge that the efficient lady will
fish it out when he needs it again.
Well, today, IT has been called upon to don the mantle of this young lady, as
biotechnology works feverishly to create a ‘better’ tomorrow for all of us.
So does that mean IT in biotech is essentially low-end ‘secretarial’ work?
Surely not. For, the multitude of researchers working across the globe, spewing
fresh information by the hour, has created a classic case of ‘info-indigestion’.
This multitude of information has brought with it various sets of problems,
which can only be solved by the advances that IT has ushered in.
Explaining the scale of data that needs to be handled in biotechnology,
Oracle Corp general manager (ebiz) S Grover says, “There are 32,000 genomes
with 1.5 million proteins in them. Each genome requires approximately 300
terabytes of trace files. So 32,000 times 300 TB is massive. Medical imaging
generates 400 million GB of data annually. Each mass spectrometer generates 200
GB of data daily. Multiply this by the 1000s of mass spectrometers in use in the
world today and you get the picture…”
This sheer volume of data calls for the creation of not just scalable, but
intelligent, databases. But can’t search engines, that most of us take for
granted now, do the trick? Try searching for the term ‘histamine’ using a
search engine. If the engine is robust, there’s no reason why it should not
throw up all references to ‘histamine’. Unless of course, some people using
histamine decide to spell it as ‘histamyne’ or ‘histemine’. If they do,
you will never know! Biologists sometimes can’t agree on the very definitions
and concepts the databases are supposed to manage.
In genomics, for example, you can’t get more fundamental than the
definition of a gene. Yet, that definition could differ. One software solution
is to develop submission protocols that follow the rules strictly. Such a system
would check submissions to ensure that the before the data is entered, the full
genus/species name matches the corresponding entry with known genus/species
names. Another solution could be to develop a drop-down selection.
Even assuming that the data entered is standardized, searching a colossal
database is no mean task. Here, software tools like the NCBI BLAST (National
Center for Biotechnology Information’s Basic Local Alignment Search Tool) run
several instances of the program, searching individual portions of the database.
IBM’s DiscoveryLink, for instance, understands the schema of different
databases and the kind of queries that can be handled. When a person or program
sends in an info-query, DiscoveryLink breaks it down and sends the parts off to
the various databases. The partial answers that come in are combined, and an
answer is returned.
The massive volume of data also calls for increased speed of computing. IBM’s
supercomputer Blue Gene–being built at an investment of $100 and to be
operational by 2005–is expected to operate at about 200 teraflops, or 1
trillion operations per second, larger than the total combined power of the top
500 supercomputers in operation today.
“With Blue Gene, IBM is trying to set a new supercomputer speed limit
– a petaflop, or a thousand trillion floating calculations per second,”
says Dr Manoj Kumar, director, IBM Research Labs, New Delhi. IRL, incidentally,
is part of the team working on Project Blue Gene.
Another problem is that databases created by different organizations store
information idiosyncratically, creating different file formats that cannot talk
to each other. To begin with itself, biological data is complex and interlinked.
A spot on a DNA array, for instance, is connected not only to immediate
information about its intensity, but to layers of information about genomic
location, DNA sequence, structure, function, and much more. Creating information
systems that allow biologists to seamlessly follow these links without getting
lost in a sea of information is a challenge for computer scientists.
Parallel computing is a concept that has been around for a long time. Break
a problem down into computationally tractable components, and instead of solving
them one at a time, employ multiple processors to solve each sub-problem
simultaneously. The parallel approach has made its way into experimental
molecular biology with technologies such as the DNA microarray. Microarrays
allow researchers to conduct thousands of gene expression experiments
simultaneously on a tiny chip.
Much of what we currently think of as part of bioinformatics–sequence
comparison, sequence database searching, sequence analysis–is more complicated
than just designing and populating databases. Bioinformati-cians are the
tool-builders, and it’s critical that they understand biological problems as
well as computational solutions in order to produce useful tools.
Developing analytical tools to discover knowledge in data is the second, and
more scientific, aspect of bioinformatics. There are many levels at which we use
biological information, whether it is in comparing sequences to develop a
hypothesis about the function of a newly discovered gene, breaking down known 3D
protein structures into bits to find patterns that help predict how the protein
folds, or modeling how proteins and metabolites in a cell work together to make
the cell function. The ultimate aim of analytical bioinformati-cians is to
develop predictive methods that allow scientists to model the function and
phenotype of an organism based on genome sequence alone.
Commenting on IBM’s role in bioinformatics, managing director Abraham
Thomas says, “Online collections of biomedical abstracts, papers and other
literature are used to produce annotated databases for easy access of
information. For example, in a protein database, the annotation for a protein
may include it’s properties, functions, structure, similarities with other
proteins, diseases associated with deficiencies in the protein etc.”
RoI shapes the final decision
Apart from database management and data-mining solution and services, there
are several other applications of IT within bioinformatics. “The challenge
is to obtain a return on the enormous investment required to obtain the
explosion of genomic data. This requires significant computational capabilities,
consisting of high-performance platforms, sophisticated and validated
algorithms, and the integration of these processes into the scientific work
process,” says Dan Stevens, director, marketing (life sciences), Silicon
Bioinformatics tools can also be in the analysis of genome sequences and
detect genes and their functionalities, protein sequences to predict their
structure (either secondary or tertiary, and analysis of clinical data to
predict toxicity of drugs and/or molecules).
Teaming up is the best option
Given the specialized nature of bioinformatics, it makes sound business
sense for IT companies to partner with pharma and research companies.
“Information technology and its optimized use can qualitatively change the
nature of this collaboration, with tools like electronic product development
exchanges,” says Oracle’s Grover.
“Build domain knowledge, partner with leading research institutes,
develop intellectual property and understand customer challenges and deliver
solutions which add value,” says DA Prasanna, vice-chairman, Wipro (also
executive officer at Wipro Healthcare and Life Sciences), outlining the success
formula for a foray into bioinformatics. But even if IT companies follow
this formula, can they really re-invent themselves as end-to-end bioinformatics
“If any of the ‘complete solutions’ provided by such companies fails
repeatedly, customers will start doubting the ability of communities in the
information technology space to fulfill their commitments. To avoid this
undesirable outcome, IT companies and professionals must learn to work within
their areas of competence, says Stevens.
Ultimately, however, it is up to an IT company to determine how far it wants
to go along the road to biotechnology. Clearly, there are rich pickings along
this road, and the further it goes, the more money it will make. The stock
market gives much higher premia to drug discovery companies–and that’s just
the tip of the biotech berg–than to pure IT companies. The downside is that
the further an IT company walks along that road, the further it moves away from
its core competence. But then, if it rakes in the moolah in these cash-strapped
times, why not? After all, proteomics, genomics and pharmacogenomics are all
derived from the Latin root-omics, which means ‘give us money’!
Manjiri Kalghatgi in New Delhi
How can you build a biotech-savvy IT workforce?
As in the case of banking, insurance, and other verticals where IT plays a
role, domain knowledge is supreme for professionals working in the field of
bioinformatics. So what does it take to build a biotech-savvy IT workforce? IT
companies need to retrain and reposition their IT and systems teams in life
science-related projects. Getting someone who knows C++ to learn Java is one
thing, but getting someone who lost touch with biology at age 15 to understand
the complex functioning of the human body is another ballgame altogether.
Getting someone who has no IT training to write software code is no mean task
either. The latter could be a trifle easier! As Compaq India director
(enterprise products) Pallab Talukdar says, "We are looking at biologists
picking up IT, rather than the reverse." And this explains the evolution of
terminology like Bio-Perl and Bio-Java.
"These are actually extensions to existing IT technology," informs
Compaq’s Talukdar, explaining that the syntax of Bioperl is actually closer to
biology than to IT. "For instance, in specifying the name of an array, Bio-Perl
addresses a variable as an amino acid, making it far easier for a biologist to
use the language," he explains. While the evolution of such languages has
certainly contributed towards bringing biologists a step closer to IT, the need
for techno-functional professionals trained in both IT and technology is growing
rapidly. But the quickest and best way of building a team for bioinformatics
projects would be to have a mix of talent from both fields on board and re-train
them to achieve the level of expertise required for the project.
"About 80% of the people we have have a basic degree in electrical
engineering or computer science. They are taught the basics of molecular
biology, and then put through a rigorous training program on the various
mathematical algorithms that are used in bioinformatics. This includes exposure
to the latest techniques, software packages and databases. About 20% of our
chosen staff has a basic degree in life sciences. Their knowledge of biology is
brushed up. They are taught the basics of computer programming, as well as
computational techniques used in bioinformatics–though not to the same level
of rigor as the EE/CS staffers," says Tata Consultancy’s Vidyasagar. So
what are the raw skills that IT professionals need to have to qualify for
training in this area?
Dr Manoj Kumar, director, IBM India Research Labs, says competence in areas
like e-biz, data and storage management, data mining, parallel / distributed
computing, middleware and knowledge management would aid in the pursuit of
bioinformatics. "Then there’s stuff like probability theory, statistics,
design and analysis of algorithms, discrete mathematics, relational and spatial
databases," he says. "IT companies need to train staff on analysis and
interpretation of biological data through techniques of visualization, algorithm
development and mathematical models," says Arena Multimedia CTO NJ Rajaram.
"A DBA just manages databases, but the role of the database researcher
has certainly grown in importance," explains Compaq’s solutions architect
Balasubramanian. "Apart from handling a variety of data, he has to deal
with organizing, indexing and managing the storage of that data. This has more
to do with biological sets and defining an indexing mechanism, which enables
convenience in data retrieval. It also involves statistical analysis of
sequences and imaging of data," elaborates Balasubramanian.
Aptech wing Arena has already launched specific courses on bioinformatics and
bio-computing. These are primarily aimed at pharmaceutical / drug research
companies, consulting companies and IT firms. "Training programs are called
for in the areas of molecular biology, handling proteomic, genomic data, DNA
structures and sequences, pharmacology and dealing with patent and bibliography
data," says Rajaram. And apart from strong fundamental IT skills and
knowledge of biotechnology concepts, the need for professionals to constantly
update themselves has never been stronger before. As SGI’s Dan Stevens says,
in the post-genomic era, professionals will have to read up and attend meetings
to learn about state-of-the-art solutions. "And apart from that, they will
have to learn how to filter through those solutions that are only trying to take
advantage of market share and the current market hype," he cautions.