Advertisment

computing@indianlanguages.com

author-image
DQI Bureau
New Update

Think about this. A

population of 900 million, 18 official languages, but only one de facto business language.

English. A language used by a meager 45 million-about 0.5% of the entire populace.

Advertisment

And now look at this. A PC

penetration of one per thousand. A large portion of Government computerization in English.

State governments going gaga over getting a piece of the software export pie. And the rest

of the 855 million Indians relegated to grapple with the issue of basic literacy.

Well, who has the time to think about

mundane things like having multilingual versions of business software and bridge the gap

between the 45 English knowing million and the rest of the 855 unglamorous million? This

is legacy that the British have left us with today, knowing English and computers is a

deadly combination for any young graduate to not only get a job, but also make a career.

Which explains why this very magazine gives you articles on information technology in

English.

Contrast this with any European or Asian

country where local language usage has a predominance over English. A Microsoft or an

Oracle has no choice but to have immediate if not simultaneous release of its local

language versions. In Korea, Microsoft was forced to take over the local word processing

market leader Hangul Software, which had 85% market-share, to establish its market for

Korean version of MS Word.

Advertisment

The point is, the English language

continues to divide us-supposedly a nuclear power-even at the turn of this century. And

the Microsofts and Oracles of the world are not to be blamed. It is an inherent problem

with us-that we are not able to market ourselves supposedly one of the largest markets in

the world-to the vendors. And this stems from our own inferiority complex of with using

our mother tongues. Knowing English is considered superior. This sharp social divide and

polarity of social development presents the greatest difficulty in the development of

multilingual computing (MLC).

MLC

initiatives




Technology Development For

Indian Languages (TDIL) Program

color="#000000">Activities and achievements during 1990-95



- size="2" color="#000000"> Corpora development-Machine readable corpora of texts in nearly
30 lakh words in all Indian languages.



- size="2" color="#000000"> Machine-aided translation system (Anusaaraka) for language pair.


- size="2" color="#000000"> Human Machine Interface system for character recognition


- size="2" color="#000000"> Natural Language Processing-Teachers Training Program.


- size="2" color="#000000"> Computer-aided learning and teaching



Present activities



- size="2" color="#000000"> Further development of Anusaaraka



- size="2" color="#000000"> Lexicon, spell checker, tools development


- size="2" color="#000000"> Web-based applications


- size="2" color="#000000"> Text-to-speech synthesis

Other activities



- size="2" color="#000000"> Department of Official Language(DOL)-funded project on bilingual

transliteration handled by CDAC.



- size="2" color="#000000"> IIT Chennai has developed a multilingual user interface with
programming environment.



- size="2" color="#000000"> NCST working with Microsoft on the Hindi version of NT.

The net effect-a throttled and stillborn

multilingual computing industry in the country. It is a clear case of chicken-and-egg

situation. It is not that efforts in developing Multi Lingual Computing have been lacking.

Several initiatives have been taken up both by the government and the private sector,

during the last decade, in this direction. But the industry still languishes at the paltry

figure of Rs12 crore in annual sales.

Advertisment

The making of an industry



Publishing in the seventies and early eighties was done using the letter-press

method. Then emerged dedicated typesetting machines, called phototypesetters. These

machines by their lineage were essentially English-driven but presented great benefits in

terms of speed and accuracy. The vernacular press however could not make a smooth

shift-over into phototypesetters because it did not support Indian languages. Such

companies had to go abroad and get an Indian font-set developed to use them. The economics

of the whole effort was distorted.

Late 1981, three engineers from the Tata

Institute of Fundamental Research (TIFR)-Dr MN Cooper, VV Joshi, and Meena Joshi-decided

to take up the problem of developing Indian phototypesetters. Thus, Modular Systems came

into being and they constructed their first machine to be installed by Dinamalar, the

leading Tamil daily. The prospects for Modular looked bright, having pioneered the

technology and a guaranteed demand from the vernacular publishing industry.

But soon enough, during the mid-eighties,

IBM-compatible personal computers came into being, word-processing was recognized as an

application, and Apple defined the desktop publishing industry. Soon enough laser printers

also appeared. The combination of a PC, DTP software, and laser printer came to be looked

upon as a killer publishing solution. The silicon had spelt death for phototypesetters.

Advertisment

Modular turned their attention to adapting

the computer-based publishing solutions in Indian languages. The starting point was

development of fonts for all Indian languages and building applications. Other vendors

like Softek, Sonata and Institute of Typographical Research (ITR) jumped into the fray.

The vernacular publishing industry was the focus. Around the same time, Blue Star made a

serious attempt to integrate the PC, publishing software, laser printer, and multilingual

software. The entire bundle was sold as a solution with the brand name of Linguist. The

MLC software used was Modular's Laserset. Later PCL also came with a similar offering

called Scriptmaster, also with Laserset.



Recommendations

of the SAARC Multilingual Conference




In global business, the

multilingual software development, considered unglamorous, has not received the attention

it deserves to propagate multimedia and multilingual information technology (M/MIT). The

member states may therefore endeavor to offer greater incentives to encourage the

development of M/MIT, and to promote its larger scale usage in the respective countries.

The industry may also set up Special Interest Groups to focus its attention in this area.

The SAARC forum provides for cooperation in

many fields among the SAARC nations. The Colombo declaration issued at the tenth SAARC

Summit encourages exchange of information technology. This avenue should be used for

greater co-operation among SAARC nations for development of M/MIT, imparting education

& training and free flow of technologies and products. As an enabling mechanism for

this, focal points for information technology may be established in each country with

special focus on M/MIT. Suitable networking arrangements may be initiated in this regard.

In order to encourage wide spread of M/MIT,

member states may encourage introduction of specific course modules in this area in all

degree colleges and other academic institutions. For this, distance education mode for

delivering quality education maybe adopted.

The following are the major areas for

development where focus needs to be increased:



- size="2" color="#000000"> Human Machine Interface systems including speech

recognition/synthesis technologies and optical character recognition.



- size="2" color="#000000"> Machine-Aided Translation


- size="2" color="#000000"> Research in ancient languages


- size="2" color="#000000"> There is a need to create information contents using


M/MIT for effective documentation and dissemination of culture and heritage of the region.
For this, mass media, Internet and CD-ROMs are an ideal choice, and thus could be used

increasingly.



- size="2" color="#000000"> In order to spread the usage of M/MIT, the localization of
important applications should be urgently carried out. For this, the services of the

software industry from the region should be utilized. In order to spread the information

technology education amongst the masses, including the schools, translation of the

relevant course material should be carried out. In the regional languages using M/MIT.



- size="2" color="#000000"> Apart from developing new products and technologies, customer
focus needs to be brought in for serving users' interests.



- size="2" color="#000000"> Standardization of various language scripts, unification of
coding sets of similar languages, standardization of terminology on computers in various

languages and multimedia components, are an important prerequisite for faster spread of

multilingual technologies and their cross migration among various applications and

countries. A SAARC Task Force maybe set up for proper coordination of this activity.



- color="#000000"> To enable deeper penetration of M/MIT, recourse should be made to the
Internet for putting results on developments in this area and information on cooperative

efforts in the public domain for the benefit of all concerned. This may promote evolution

of common networking languages.



- size="2" color="#000000"> With the interest and needs developing in this area among SAARC
countries, it was felt that such conferences should be held every year by rotation, among

SAARC nations, which may also review progress on recommendations.



- size="2" color="#000000"> An annual SAARC M/MIT award may be instituted in the region for
recognizing commendable performance in this area









Advertisment

Around this time, in 1987, Prof RMK Sinha

and his team at IIT Kanpur were working on a multilingual computing project called as GIST

(Graphics Intelligence-based Script Technology). The project was transferred to CDAC, a

DoE institution based in Pune. Headed by Mohan Tambe, CDAC took the responsibility for the

development of MLC technology. The GIST technology was transferred to about 10 companies

like HCL, Wipro, DCM and others. CDAC's efforts towards standardization rests on three

important components-the ISCII standard (like ASCII), the Inscript keyboard and phonetic

typing, and the ISFOC font standard. Apart from the development of these standards, CDAC

also brought out application products like word-processors, DTP software, video

subtitling, character generator, other video applications, and it also developed a

standard for multilingual paging. CDAC's mission was development and then

commercialization. According to SS Pujari, GIST Group Coordinator, "We have between

50% to 100% market share for various multilingual computing products in the country".

Adds CB Raje, Business Group Coordinator for GIST, "We also spread MLC technology by

organizing training programs throughout the country."

In the private sector, many new players

started emerging like Summit Data Products, SRG Systems, Aces Software and Sonata. There

have been primarily two principal markets for MLC-private sector publishing and the

government department applications. Companies like Modular and Sonata ruled the former

while CDAC prevailed in the latter. Another company, Cirrus Electronics whose strength was

marketing and distribution also emerged. Cirrus now has a network of 200 dealers in the

country and is the marketing arm for Modular.

CDAC too was a commercial success. However,

there is a feeling amongst private sector vendors that CDAC's involvement in

commercialization is unwarranted. Says Dr MN Cooper, MD, Modular, "CDAC was

instituted to be a body developing technology and spreading it through partnerships. But

instead I find it to be functioning more like a marketing company."

Advertisment

MLC Market in India: Issues



In terms of value, the MLC products market is just around Rs12 crore annually.

Though CDAC's work has been pioneering and effective in spreading MLC technology in the

country, the contribution of the private sector is generally not appreciated. These

vendors have a vibrancy of their own despite the issues faced by them.

Developing the need-of-the-hour products

with meager resources and marketing them has been the topmost challenge. Cirrus has four

MLC application products on the anvil. Modular has important applications in

text-to-speech and speech-to-text area. Phonetic typing and user friendliness is on the

rise. According to Ajay Agarwal, MD, Seacom Solutions, "Our product Sulipi provides a

truly phonetic solution using the ISCII standard. Sulipi can also be used to do

programming in local script since it can sit on any programming platform."



Advertisment
Effective Encoding Of

Indian Scripts



The need of the hour is to

evolve a common protocol for font-enabling of Indian scripts. Till now, most of the

font-enabling work done in the country has been of an individualistic nature, devoid of

any common standard. This has not only lead to repetition of work, but has resulted in

non-portability problems. Some in the industry believe that this could lead to uprooting

of the Indian language users in the future.

color="#000000">For this to happen, it will be necessary to adopt a font encoding

technique which does not require any software interpretation, and supports features

available in the off-the-shelf software. Universality of its application, and its ability

to scale up and easily adopt future technological advancements should form the core of the

common protocol.

Font encoding is nothing but to accommodate

all the possible glyphs of a language to cater to all the possible requirement. Currently,

any font-enabled software available in the country doesn't meet this criteria to the

fullest. As a result, most of the fonts available now can only be used for document

processing software like MS Word, and lacks universality of application. The main reason

behind this is that most of the softwares available are not based on the coding system

used in the OS, but address the issue at a superficial font-level.

Right now, there are two accepted font

encoding standards for Indian languages, namely, ISCII, devised and approved by DoE, and

Unicode, a currently in vogue international standard. But for the present, only few of the

many software available in the country comply to either of this standard. Further, there

exists different standards for various electronic technologies. For instance, in pagers,

the Indian standard is ISCLAP which is already being used by some of the paging service

providers. But as newer convergent technologies like the Web TV and set-top boxes enter

the Indian market there will be a dire need to develop a commonly accepted standard to

accommodate them under one umbrella.

The technologists of the country will have

to decide on common standards for font encoding. There is also another option, if they may

wish to choose-develop fonts based on globally proven technologies. For instance, the True

Type technology jointly developed by Microsoft and Apple. At least, Microsoft's supremacy

on the PC platform may solve the serious issue of portability of various Indian

font-enabled software.

Lack of funding for product development is

a deterrent. According to Cooper, "We started the company with our provident fund

money and later ploughing back our thin profits. We do not have a DoE to fund us in crores

for every project." While Modular develops its own products, CDAC's product

development partner is Men At Work, a software company based in Pune. Men At Work is

credited with the development of the successful LEAP family of products.

For the potential that exists for MLC

software, the distribution mechanism is highly inadequate. Again distribution networks

being costly to implement and run, specialist organizations like Cirrus Electronics have

emerged. According to Sanjiv Mehta, MD, Cirrus, "We work closely with Modular and

manage distribution. We used to distribute CDAC products also till such time there was no

overlap between them." Cirrus is planning to venture into MLC training also.

The efforts by the industry somehow seems

to be fragmented, with no clear vision or guidance. And this lack of focus hasn't been

able to instigate a mass movement in spreading the usage of MLC technology. What more,

this is probably the main cause for the low penetration of computers and computerization

in the country. The National IT Task Force recommendations include the spread of MLC

computing in the country. But what do we find. The issue of software exports takes the

limelight in the recommendations and MLC takes the backseat.

With the spread of computers in newer

segments, MLC technology will have to bear the responsibility of promoting usage in these

segments. Through the work of government and private enterprises the importance and

possibilities of MLC technology has been realized. Development of potential technologies

like morphological analyzers and spell checkers, machine-aided translation systems,

man-machine interfaces, Internet tools amongst others are some of the thrust areas for

applications in the future.

EASWARDAS SATYAN,

DHAVAL VALIA



and ANUPA VASUDEVAN in Mumbai,


with YOGRAJ VARMA in New Delhi.


Advertisment