Advertisment

Cut Out that Jazz

author-image
DQI Bureau
New Update

It’s happened to the IT industry before. A wave festooned with the

trimmings of ‘big, happening, the thing of the future’ sweeps it off it’s

feet, engulfing in the process the small and big companies in its expanse. And

the jargon that it brings in its wake triggers a new rush to clamber on to the

new segment, before the initial hoopla dies out. The cinders of the software

services era of the early 90s are still glowing. The hot ash from the dot-com

furnace is a bitter reminder. Banking, insurance and telecom are still crackling

away.

Advertisment

That the biotechnology furnace is hot is an old story. Indian IT companies

have already been rolling toward it in right ernest. And the wannabes–other IT

companies and professionals, hovering at the fringe, are just waiting to sneak

in. But scan the reams of information on biotechnology and you find proteins,

cells and DNA, which you thought you’d left behind for good in Class X and

XII. So, where’s the IT in BT?

To begin with, we need to differentiate between biotechnology and

bioinformatics. Biotechnology employs molecular biology and genetics to create

improved agricultural products, food, animal feed, industrial materials and

medicines. It deals with the experimental techniques and instrumentation in

biology. Bioinformatics, on the other hand, is about IT solutions to biological

problems and applying computer technology to the management of biological

information. So it is in bioinformatics in which IT has a role, not in

biotechnology. As Zensar global CEO Ganesh Natarajan says, "Companies need

to stay out of biotech and focus on bioinformatics. Getting a few people trained

in biotechnology alone will not create another revolution."

Can You

Become a Bioinformatics Professional?

Raw

Skills

Acquired

skills
C,

C++, Java, OOPS, J2EE4
BioXML,

Biojava, Bioperl,
VRML4 3EMBL,

Genbank, Swiss-Prot,
XML4 3geml,
RDBMS:

Oracle PL/SQL4
3Biology

concepts,
CGI,UNIX,

PERL4
3Statistics,

Algorithms Mathematical models, Probability theory
Web

tools: HTML, ASP, JSP,4 JDBC, Swing
3Bioinformatics

software: GCG BLAST RasMol
You

should be comfortable working in a command-line computing

environment. Working in Linux or Unix will provide this experience
You

should have experience with programming in computer languages such

as C or C++, as well as in scripting languages like Perl or Python
You

should have knowledge of a major molecular biology software, as well

as a fairly deep background in some aspects of molecular biology

Advertisment

Given the current hype surrounding biotechnology, and the eagerness displayed

by IT companies in this field, comparisons with the Internet wave are but

natural. Bioinformatics, however, is a different ballgame. To begin with, there

are no quick returns to be had. Not only do companies need huge investments,

patience is also an essential prerequisite. And unlike quick-fix diplomas in Web

designing and ‘A-to-Z of Java’ courses that powered many a dot-com, an IT

company would need to invest in at least six months of intensive training to get

computational biology through to its people. Here again, the program needs to be

designed and taught by experts. And given that bioinformatics cannot be tackled

by programming power alone, there’s a need to have life science experts with

strong domain expertise on board. After the training comes the designing of the

‘going-to-market’ strategy. IT companies need to figure out exactly where

they are headed–offering contract services, creating original intellectual

property in the form of algorithms and technologies, dabbling in database

services and product development, or conducting joint research in the drug

discovery area.

Jargon

in the Jazz
Demystifying

biotechnology
Proteomics

The

number of proteins expressed from DNA changes constantly as a result

of altering conditions like eating, during sleep, as the result of

an infection or other disease, during all phases of pregnancy,

because of a wound etc. Proteomics aims to understand the protein

expression in response to such conditions.

Proteomics

Software



Proteomic software provides scientists with the ability to conduct

database searches of known protein sequences utilizing batch or

real-time processing. This software is capable of controlling

automated hardware, i.e. robotics, as well as facilitating data

transfer operation.

GEML



The Genome Markup Language is based on the Extensible Markup

Language (XML), the standard framework for storing and transferring

structured data over the Internet. Rosetta Inpharmatics, with the

assistance of others in the GEML community, has created the GEML

format to help standardize how DNA microarray and gene expression

data is stored and transferred between different gene expression

analysis systems and databases.

Micro array or

biochip technology


This makes it possible to simultaneously study expressions of

thousands of genes or proteins in a single experiment in the

laboratory. The large volume of data generated by a series of such

experiments is expected to help understand the genetic basis of

diseases.

Chromosomes

These

are discrete units of the genome carrying many genes, consisting of

(histone) proteins and a very long molecule of DNA. They are found

in the nucleus of every plant and animal cell.

DNA

Short for dioxyribonucleic acid. DNA is made up of four bases,

represented by A (adenine), C (cystosine), G (guanine), T (thymine).

The order of these bases provides the cell with coded instructions

on everything it does. DNA sequencing involves determining the order

of these bases along a DNA strand.

Gene

The basic unit for transmitting hereditary traits. A gene is made up

of the DNA sequence that provides the code for a protein with a

known function in the cell.

Protein

A

chain of amino acids. A protein is usually at least 50 amino acids

long. Proteins form structural elements in a



cell and are also involved in the chemical reactions that occur in a
cell.

Amino Acid

The sub-unit that makes up a protein. There are 20 types of amino

acids. Protein sequencing involves determining the order of amino

acids in a protein.

Genome

The genome comprises all the genetic material in a cell or organism.

The human genome is made up of 23 pairs of chromosomes each

containing hundreds of genes. There are approximately 30,000 genes

in the human body.

Human Genome

Project
A

bioinformatics project that has identified the 30,000 genes in human

DNA. Coordinated by the US Department of Energy and the National

Institutes of Health, the US Human Genome Project started in 1990

and released its findings in February 2001 along with findings from

a separate project by Celera Genomics Group. There are similar

projects in other countries as well. The purpose is to store the 3

billion chemical base pairs (the DNA sequence) derived from these

analyses in databases for use in biomedical research.

"IT companies should be able to set up the necessary infrastructure with

high computing power. They should also be willing to diversify or partner with

someone to address the wet (actual experiments) side of biology. This is not a

field where things can be done in isolation without the active involvement of

the end-user," points out Patni Computer general manager Dilip

Dhanuka.

Advertisment

Five-fold jump in five years



"According to a CII report, the Indian biotechnology market was valued

at $2.5 billion in 2001, a five-fold increase since 1997. It is expected to

reach $10 billion by 2010. Against this, the biopharma market worldwide is

estimated at about $17 billion," says Tata Consultancy Services executive

vice-president (advanced technology) M Vidyasagar.

Compare this with a market size of $380 billion (conventional pharmaceutical

industry) and the nearly $800-billion IT industry. Even India’s IT industry

(both domestic and exports) of about $12 billion is more than half of the

worldwide biopharma market. Given the small market size, adds Dr Vidyasagar,

"there’s too much hype about bioinformatics" at present.

"The global bioinformatics opportunity is expected to be over $8 billion

by 2008 and Indian companies, which understand the nuances of areas like data

analysis for genomics and proteinomics, can capture a share of this pie,"

says Zensar’s Natarajan.

Advertisment

Simple-speak



Numbers apart, where is it that IT contributes to biotechnology?

Think of the times when you could get away with being politically incorrect.

Conjure up an image of this absent-minded research genius who works tirelessly

but never knows what he keeps where. Here is this smart, efficient young lady

who, out of the mess that his lab is, brandishes just the right equation,

algorithm, sequence, at the very instant he needs it. And as is the case with

geniuses, he carelessly chucks that all-important slip of paper into the mayhem

after he’s done with it, basking in the knowledge that the efficient lady will

fish it out when he needs it again.

Well, today, IT has been called upon to don the mantle of this young lady, as

biotechnology works feverishly to create a ‘better’ tomorrow for all of us.

So does that mean IT in biotech is essentially low-end ‘secretarial’ work?

Surely not. For, the multitude of researchers working across the globe, spewing

fresh information by the hour, has created a classic case of ‘info-indigestion’.

This multitude of information has brought with it various sets of problems,

which can only be solved by the advances that IT has ushered in.

Advertisment

Explaining the scale of data that needs to be handled in biotechnology,

Oracle Corp general manager (ebiz) S Grover says, "There are 32,000 genomes

with 1.5 million proteins in them. Each genome requires approximately 300

terabytes of trace files. So 32,000 times 300 TB is massive. Medical imaging

generates 400 million GB of data annually. Each mass spectrometer generates 200

GB of data daily. Multiply this by the 1000s of mass spectrometers in use in the

world today and you get the picture..."

This sheer volume of data calls for the creation of not just scalable, but

intelligent, databases. But can’t search engines, that most of us take for

granted now, do the trick? Try searching for the term ‘histamine’ using a

search engine. If the engine is robust, there’s no reason why it should not

throw up all references to ‘histamine’. Unless of course, some people using

histamine decide to spell it as ‘histamyne’ or ‘histemine’. If they do,

you will never know! Biologists sometimes can’t agree on the very definitions

and concepts the databases are supposed to manage.

In genomics, for example, you can’t get more fundamental than the

definition of a gene. Yet, that definition could differ. One software solution

is to develop submission protocols that follow the rules strictly. Such a system

would check submissions to ensure that the before the data is entered, the full

genus/species name matches the corresponding entry with known genus/species

names. Another solution could be to develop a drop-down selection.

Advertisment
Who’s Doing What
Indian IT companies and their bioinformatics initiatives
TCS

has partnered with the Centre for DNA Fingerprinting and Diagnostics

(CDFD), the Indian Statistical Institute as well as Government of

India’s Department of Biotechnology. TCS has hired about 35

professionals who will undergo training for this initiative. The

company’s focus will be on the creation of original intellectual

property (IP) in the form of original algorithms and methodologies.

This work is being undertaken at the Advanced Technology Centre of

TCS in Hyderabad.

IBM

India Research Labs (IRL), New Delhi,

is working to generate Intellectual Properties in algorithms to

support Life Sciences research apart from contributing to the

development of the Blue Gene, super computer. IRL is engaged in a

research project on Gene Structure and Function Prediction in

collaboration with the Department of Biochemical Engineering and

Biotechnology, IIT, New Delhi. IBM Global Services India( IGSI)

Exports is currently involved in building prototypes, demonstrable

pilot application for emerging challenges facing the Life Sciences

industry. This group has embarked on developing a tool that helps in

data discovery from disparate data sources. IGSI Exports further

contributes to IBM’s Life Sciences initiative by building

competencies in developing, testing and implementing Bioinformatics

databases, user tools, utilities and applications. IRL and IGSI

Exports are collaborating to build a prototype system for

Bioinformatics for Microaarays.




Wipro
is

looking at providing IT solutions to life science companies,

outsourcing of R&D activites of major biotechnology companies

and embedded software solutions for the medical devices markets. The

company is also ‘strengthening its relationship’ with Beckman

Coulter. In April 2002, Wipro launched Wipro Healthcare and Life

Science to address the requirements of the Bio-IT market. This

business will offer solutions to hospitals and health insurance

companies for efficient and compliant work flows, Bio-informatics

solutions to Life Science companies to significantly reduce drug

discovery cycle time, and IT solutions to Medical and Analytical

devices companies.

Compaq India

has tied up with IIT, Mumbai, TIFR for its foray into bioinformatics

apart from partnerships with the National University of Singapore.

Compaq recently set up a Super Computing Lab for the Center for

Biotechnology in New Delhi. Compaq has deployed 8 AlphaServer and 1

terabyte of SAN storage solution. Compaq has also provided software

tools for parallelization and will provide complete maintenance of

the lab for the next three years. Compaq has also been hosting

multiple biotech forums.

Zensar

is the technology partner for a bioInformatics venture focused on

the creation of a new model for virtual research. This is a modeling

environment that will combine research and data analysis

capabilities with Artificial Intelligence components. The initial

project is valued at half a million dollars and the company will

have teams of eight to 20 people working in this area. Infosys and

Patni Computer are assessing developments in the bioinformatics

arena before they decide to explore opportunities in this sector.

Government Bioinformatics

and Genomics Centre (BGC)



Bioinformatics Centres under BTIS:
Jawaharlal Nehru University,

New Delhi; Indian Institute of Science, Bangalore; Bose Institute,

Kolkata; Institute of Microbial Technology, Chandigarh; Madurai

Kamraj University, Madurai; Indian Agricultural Research Institute,

New Delhi; University of Poona, Pune; Center for Cellular and

Molecular Biology, Hyderabad; National Institute of Immunology, New

Delhi; Department of Biotechnology, Govt of India, New Delhi ,Bioinformatics

Centres:
Advanced Diploma in Bioinformatics; Direct Ph.D

programme in Bioinformatics

Even assuming that the data entered is standardized, searching a colossal

database is no mean task. Here, software tools like the NCBI BLAST (National

Center for Biotechnology Information’s Basic Local Alignment Search Tool) run

several instances of the program, searching individual portions of the database.

IBM’s DiscoveryLink, for instance, understands the schema of different

databases and the kind of queries that can be handled. When a person or program

sends in an info-query, DiscoveryLink breaks it down and sends the parts off to

the various databases. The partial answers that come in are combined, and an

answer is returned.

Advertisment

The massive volume of data also calls for increased speed of computing. IBM’s

supercomputer Blue Gene–being built at an investment of $100 and to be

operational by 2005–is expected to operate at about 200 teraflops, or 1

trillion operations per second, larger than the total combined power of the top

500 supercomputers in operation today.

"With Blue Gene, IBM is trying to set a new supercomputer speed limit

– a petaflop, or a thousand trillion floating calculations per second,"

says Dr Manoj Kumar, director, IBM Research Labs, New Delhi. IRL, incidentally,

is part of the team working on Project Blue Gene.

Another problem is that databases created by different organizations store

information idiosyncratically, creating different file formats that cannot talk

to each other. To begin with itself, biological data is complex and interlinked.

A spot on a DNA array, for instance, is connected not only to immediate

information about its intensity, but to layers of information about genomic

location, DNA sequence, structure, function, and much more. Creating information

systems that allow biologists to seamlessly follow these links without getting

lost in a sea of information is a challenge for computer scientists.

Parallel computing



Parallel computing is a concept that has been around for a long time. Break

a problem down into computationally tractable components, and instead of solving

them one at a time, employ multiple processors to solve each sub-problem

simultaneously. The parallel approach has made its way into experimental

molecular biology with technologies such as the DNA microarray. Microarrays

allow researchers to conduct thousands of gene expression experiments

simultaneously on a tiny chip.

Much of what we currently think of as part of bioinformatics–sequence

comparison, sequence database searching, sequence analysis–is more complicated

than just designing and populating databases. Bioinformati-cians are the

tool-builders, and it’s critical that they understand biological problems as

well as computational solutions in order to produce useful tools.

Developing analytical tools to discover knowledge in data is the second, and

more scientific, aspect of bioinformatics. There are many levels at which we use

biological information, whether it is in comparing sequences to develop a

hypothesis about the function of a newly discovered gene, breaking down known 3D

protein structures into bits to find patterns that help predict how the protein

folds, or modeling how proteins and metabolites in a cell work together to make

the cell function. The ultimate aim of analytical bioinformati-cians is to

develop predictive methods that allow scientists to model the function and

phenotype of an organism based on genome sequence alone.

Commenting on IBM’s role in bioinformatics, managing director Abraham

Thomas says, "Online collections of biomedical abstracts, papers and other

literature are used to produce annotated databases for easy access of

information. For example, in a protein database, the annotation for a protein

may include it’s properties, functions, structure, similarities with other

proteins, diseases associated with deficiencies in the protein etc."

RoI shapes the final decision

Apart from database management and data-mining solution and services, there

are several other applications of IT within bioinformatics. "The challenge

is to obtain a return on the enormous investment required to obtain the

explosion of genomic data. This requires significant computational capabilities,

consisting of high-performance platforms, sophisticated and validated

algorithms, and the integration of these processes into the scientific work

process," says Dan Stevens, director, marketing (life sciences), Silicon

Graphics.

Bioinformatics tools can also be in the analysis of genome sequences and

detect genes and their functionalities, protein sequences to predict their

structure (either secondary or tertiary, and analysis of clinical data to

predict toxicity of drugs and/or molecules).

Teaming up is the best option



Given the specialized nature of bioinformatics, it makes sound business

sense for IT companies to partner with pharma and research companies.

"Information technology and its optimized use can qualitatively change the

nature of this collaboration, with tools like electronic product development

exchanges," says Oracle’s Grover.

"Build domain knowledge, partner with leading research institutes,

develop intellectual property and understand customer challenges and deliver

solutions which add value," says DA Prasanna, vice-chairman, Wipro (also

executive officer at Wipro Healthcare and Life Sciences), outlining the success

formula for a foray into bioinformatics. But even if IT companies follow

this formula, can they really re-invent themselves as end-to-end bioinformatics

solution providers?

"If any of the ‘complete solutions’ provided by such companies fails

repeatedly, customers will start doubting the ability of communities in the

information technology space to fulfill their commitments. To avoid this

undesirable outcome, IT companies and professionals must learn to work within

their areas of competence, says Stevens.

Ultimately, however, it is up to an IT company to determine how far it wants

to go along the road to biotechnology. Clearly, there are rich pickings along

this road, and the further it goes, the more money it will make. The stock

market gives much higher premia to drug discovery companies–and that’s just

the tip of the biotech berg–than to pure IT companies. The downside is that

the further an IT company walks along that road, the further it moves away from

its core competence. But then, if it rakes in the moolah in these cash-strapped

times, why not? After all, proteomics, genomics and pharmacogenomics are all

derived from the Latin root-omics, which means ‘give us money’!

Manjiri Kalghatgi in New Delhi

TechnoFunctionals

How can you build a biotech-savvy IT workforce?

As in the case of banking, insurance, and other verticals where IT plays a

role, domain knowledge is supreme for professionals working in the field of

bioinformatics. So what does it take to build a biotech-savvy IT workforce? IT

companies need to retrain and reposition their IT and systems teams in life

science-related projects. Getting someone who knows C++ to learn Java is one

thing, but getting someone who lost touch with biology at age 15 to understand

the complex functioning of the human body is another ballgame altogether.

Getting someone who has no IT training to write software code is no mean task

either. The latter could be a trifle easier! As Compaq India director

(enterprise products) Pallab Talukdar says, "We are looking at biologists

picking up IT, rather than the reverse." And this explains the evolution of

terminology like Bio-Perl and Bio-Java.

"These are actually extensions to existing IT technology," informs

Compaq’s Talukdar, explaining that the syntax of Bioperl is actually closer to

biology than to IT. "For instance, in specifying the name of an array, Bio-Perl

addresses a variable as an amino acid, making it far easier for a biologist to

use the language," he explains. While the evolution of such languages has

certainly contributed towards bringing biologists a step closer to IT, the need

for techno-functional professionals trained in both IT and technology is growing

rapidly. But the quickest and best way of building a team for bioinformatics

projects would be to have a mix of talent from both fields on board and re-train

them to achieve the level of expertise required for the project.

"About 80% of the people we have have a basic degree in electrical

engineering or computer science. They are taught the basics of molecular

biology, and then put through a rigorous training program on the various

mathematical algorithms that are used in bioinformatics. This includes exposure

to the latest techniques, software packages and databases. About 20% of our

chosen staff has a basic degree in life sciences. Their knowledge of biology is

brushed up. They are taught the basics of computer programming, as well as

computational techniques used in bioinformatics–though not to the same level

of rigor as the EE/CS staffers," says Tata Consultancy’s Vidyasagar. So

what are the raw skills that IT professionals need to have to qualify for

training in this area?

Dr Manoj Kumar, director, IBM India Research Labs, says competence in areas

like e-biz, data and storage management, data mining, parallel / distributed

computing, middleware and knowledge management would aid in the pursuit of

bioinformatics. "Then there’s stuff like probability theory, statistics,

design and analysis of algorithms, discrete mathematics, relational and spatial

databases," he says. "IT companies need to train staff on analysis and

interpretation of biological data through techniques of visualization, algorithm

development and mathematical models," says Arena Multimedia CTO NJ Rajaram.

"A DBA just manages databases, but the role of the database researcher

has certainly grown in importance," explains Compaq’s solutions architect

Balasubramanian. "Apart from handling a variety of data, he has to deal

with organizing, indexing and managing the storage of that data. This has more

to do with biological sets and defining an indexing mechanism, which enables

convenience in data retrieval. It also involves statistical analysis of

sequences and imaging of data," elaborates Balasubramanian.

Aptech wing Arena has already launched specific courses on bioinformatics and

bio-computing. These are primarily aimed at pharmaceutical / drug research

companies, consulting companies and IT firms. "Training programs are called

for in the areas of molecular biology, handling proteomic, genomic data, DNA

structures and sequences, pharmacology and dealing with patent and bibliography

data," says Rajaram. And apart from strong fundamental IT skills and

knowledge of biotechnology concepts, the need for professionals to constantly

update themselves has never been stronger before. As SGI’s Dan Stevens says,

in the post-genomic era, professionals will have to read up and attend meetings

to learn about state-of-the-art solutions. "And apart from that, they will

have to learn how to filter through those solutions that are only trying to take

advantage of market share and the current market hype," he cautions.

Advertisment