Advertisment

Decoding proteins

author-image
DQI Bureau
New Update

In a few years,

biologists will complete the momentous task of reading the entire

human genome, the sequence of more than three billion symbols-chemical

bases-that determine our biological natures. "That is when

the real work will begin," says Ididore Rigoutsos, Manager,

Computational Biology Center, IBM.



Advertisment

The genetic

code is written in an alphabet of four symbols. It is a program

that directs the construction of proteins, the truly important molecules

of life. And that has significant implications for the pharmaceutical

industry. "When new drugs are developed, what they are targeting,

with few exceptions, are proteins," says Barry Robson, IBM

Distinguished Engineer and strategic adviser to the Computational

Biology Center.



The critical

fact about proteins is their shapes. Their nooks and crannies fit

into one another like keys into locks, controlling the whole range

of cellular processes. Every protein consists of some combination

of the 20 different kinds of aminoacids. But identifying the purpose

of each protein sequence is a formidable task.



To gain this

knowledge, researchers take advantage of the fact that evolution

is parsimonious, using the same structures over and over. By looking

for proteins whose amino-acid sequence is homologous to that of

a protein whose structure is already known, scientists can make

educated guesses about the unknown structure. One of the latest

and most promising techniques for finding patterns came about as

a result of an accident.



Advertisment

"I fell

off my bicycle and broke my back," says Rigoutsos. "Because

I was in bed for three months, I had a lot of time to read."

His review of the literature revealed that people had been trying

to solve the problem of finding recurrent patterns in the structures

of proteins or DNA by attempting to align sequences with one another.

If several sequences matched around a location, scientists would

take this as evidence for a pattern. Rigoutsos wondered if he could

turn the process around and find patterns directly and then use

the patterns to align sequences. He and Aris Floratos, another member

of IBM's bioinformatics and pattern discovery group, devised a powerful

algorithm they dubbed Teiresias, after the blind seer of Greek mythology.



Teiresias finds

patterns while making very few assumptions about what it is looking

for. It has found uses outside biology in such areas as identifying

attacks on computer systems and analyzing literary style. Using

Teiresias, Rigoutsos and Floratos have compiled a 'Bio-Dictionary',

that may contain the key to understanding the language of the genes.



"Take a

copy of the Wall Street Journal and remove all the spaces,"

Rigoutsos suggests to illustrate how the Bio-Dictionary was assembled.

"You know the paragraph, you know the symbols. The task is

to find the words, but you do not know the symbols. We have done

the same thing for proteins." The 'words' they have discovered

constitute the basic vocabulary of proteins. Like human words, they

link together according to rules to form sentences-that is, proteins.

The IBM researchers have begun to decipher the words in their Bio-Dictionary,

to interpret what structural and functional features they represent.

"The analogy to natural language appears to be deep,"

Rigoutsos explains.



Advertisment

One of the biggest

riddles Deep Computing could answer is how a strand of amino-acids

folds into a protein. "Nobody has yet simulated that process,"

Robson says. "It's a deep, fundamental problem. Until it's

solved, you can't design interesting new proteins from scratch.

More important still, if we can crack this problem from first principles,

we can design new polymers and materials, and ultimately create

molecular-scale devices."



In the end, the researchers might be able to refine their algorithms
enough to predict the folding of any protein structure, not just

natural ones. This not only holds the promise of engineering new

drugs. It could also allow the creation of unique, self-assembling

molecular structures that could realize the dream of building molecular-scaled

machines.




The original

idea of nanotechnologists was to build nanoscale robots called 'assemblers'

that would construct molecular machinery. "Well, nature doesn't

work by making these robots," Robson points out. Instead, it

specifies the linear sequence of amino acids in a protein and lets

the laws of physics do the building for free. "The problem

is how to do that on a general basis," Califano says. "If

I want to build a hammer, how do I make my protein fold into a hammer?

You can't solve that problem unless you understand how proteins

fold." He and his colleagues are betting that the hammer of

Deep Computing will enable scientists to do just that.



In fact, Deep

Computing offers not just a hammer but an entire toolbox of techniques,

technologies and philosophies. By joining the raw computing power

and algorithmic virtuosity that were once the province of high-end

scientific computing with the vast oceans of data typical of business

computing, IBM scientists are forging a new discipline capable of

solving real-world problems in all their complexity and depth.

Bruce

Schechter



Excerpted from: Think Research, 1999


Courtesy: IBM

Advertisment