computing@indianlanguages.com

Think about this. A
population of 900 million, 18 official languages, but only one de facto business language.
English. A language used by a meager 45 million-about 0.5% of the entire populace.

And now look at this. A PC
penetration of one per thousand. A large portion of Government computerization in English.
State governments going gaga over getting a piece of the software export pie. And the rest
of the 855 million Indians relegated to grapple with the issue of basic literacy.

Well, who has the time to think about
mundane things like having multilingual versions of business software and bridge the gap
between the 45 English knowing million and the rest of the 855 unglamorous million? This
is legacy that the British have left us with today, knowing English and computers is a
deadly combination for any young graduate to not only get a job, but also make a career.
Which explains why this very magazine gives you articles on information technology in
English.

Contrast this with any European or Asian
country where local language usage has a predominance over English. A Microsoft or an
Oracle has no choice but to have immediate if not simultaneous release of its local
language versions. In Korea, Microsoft was forced to take over the local word processing
market leader Hangul Software, which had 85% market-share, to establish its market for
Korean version of MS Word.

The point is, the English language
continues to divide us-supposedly a nuclear power-even at the turn of this century. And
the Microsofts and Oracles of the world are not to be blamed. It is an inherent problem
with us-that we are not able to market ourselves supposedly one of the largest markets in
the world-to the vendors. And this stems from our own inferiority complex of with using
our mother tongues. Knowing English is considered superior. This sharp social divide and
polarity of social development presents the greatest difficulty in the development of
multilingual computing (MLC).

MLC
initiatives

Technology Development For
Indian Languages (TDIL) Program

Activities and achievements during 1990-95
Corpora development-Machine readable corpora of texts in nearly
30 lakh words in all Indian languages.
Machine-aided translation system (Anusaaraka) for language pair.
Human Machine Interface system for character recognition
Natural Language Processing-Teachers Training Program.
Computer-aided learning and teaching

Present activities
Further development of Anusaaraka
Lexicon, spell checker, tools development
Web-based applications
Text-to-speech synthesis

Other activities
Department of Official Language(DOL)-funded project on bilingual
transliteration handled by CDAC.
IIT Chennai has developed a multilingual user interface with
programming environment.
NCST working with Microsoft on the Hindi version of NT.

The net effect-a throttled and stillborn
multilingual computing industry in the country. It is a clear case of chicken-and-egg
situation. It is not that efforts in developing Multi Lingual Computing have been lacking.
Several initiatives have been taken up both by the government and the private sector,
during the last decade, in this direction. But the industry still languishes at the paltry
figure of Rs12 crore in annual sales.

The making of an industry
Publishing in the seventies and early eighties was done using the letter-press
method. Then emerged dedicated typesetting machines, called phototypesetters. These
machines by their lineage were essentially English-driven but presented great benefits in
terms of speed and accuracy. The vernacular press however could not make a smooth
shift-over into phototypesetters because it did not support Indian languages. Such
companies had to go abroad and get an Indian font-set developed to use them. The economics
of the whole effort was distorted.

Late 1981, three engineers from the Tata
Institute of Fundamental Research (TIFR)-Dr MN Cooper, VV Joshi, and Meena Joshi-decided
to take up the problem of developing Indian phototypesetters. Thus, Modular Systems came
into being and they constructed their first machine to be installed by Dinamalar, the
leading Tamil daily. The prospects for Modular looked bright, having pioneered the
technology and a guaranteed demand from the vernacular publishing industry.

But soon enough, during the mid-eighties,
IBM-compatible personal computers came into being, word-processing was recognized as an
application, and Apple defined the desktop publishing industry. Soon enough laser printers
also appeared. The combination of a PC, DTP software, and laser printer came to be looked
upon as a killer publishing solution. The silicon had spelt death for phototypesetters.

Modular turned their attention to adapting
the computer-based publishing solutions in Indian languages. The starting point was
development of fonts for all Indian languages and building applications. Other vendors
like Softek, Sonata and Institute of Typographical Research (ITR) jumped into the fray.
The vernacular publishing industry was the focus. Around the same time, Blue Star made a
serious attempt to integrate the PC, publishing software, laser printer, and multilingual
software. The entire bundle was sold as a solution with the brand name of Linguist. The
MLC software used was Modular’s Laserset. Later PCL also came with a similar offering
called Scriptmaster, also with Laserset.

Recommendations
of the SAARC Multilingual Conference

In global business, the
multilingual software development, considered unglamorous, has not received the attention
it deserves to propagate multimedia and multilingual information technology (M/MIT). The
member states may therefore endeavor to offer greater incentives to encourage the
development of M/MIT, and to promote its larger scale usage in the respective countries.
The industry may also set up Special Interest Groups to focus its attention in this area.

The SAARC forum provides for cooperation in
many fields among the SAARC nations. The Colombo declaration issued at the tenth SAARC
Summit encourages exchange of information technology. This avenue should be used for
greater co-operation among SAARC nations for development of M/MIT, imparting education
& training and free flow of technologies and products. As an enabling mechanism for
this, focal points for information technology may be established in each country with
special focus on M/MIT. Suitable networking arrangements may be initiated in this regard.

In order to encourage wide spread of M/MIT,
member states may encourage introduction of specific course modules in this area in all
degree colleges and other academic institutions. For this, distance education mode for
delivering quality education maybe adopted.

The following are the major areas for
development where focus needs to be increased:
Human Machine Interface systems including speech
recognition/synthesis technologies and optical character recognition.
Machine-Aided Translation
Research in ancient languages
There is a need to create information contents using
M/MIT for effective documentation and dissemination of culture and heritage of the region.
For this, mass media, Internet and CD-ROMs are an ideal choice, and thus could be used
increasingly.
In order to spread the usage of M/MIT, the localization of
important applications should be urgently carried out. For this, the services of the
software industry from the region should be utilized. In order to spread the information
technology education amongst the masses, including the schools, translation of the
relevant course material should be carried out. In the regional languages using M/MIT.
Apart from developing new products and technologies, customer
focus needs to be brought in for serving users’ interests.
Standardization of various language scripts, unification of
coding sets of similar languages, standardization of terminology on computers in various
languages and multimedia components, are an important prerequisite for faster spread of
multilingual technologies and their cross migration among various applications and
countries. A SAARC Task Force maybe set up for proper coordination of this activity.

To enable deeper penetration of M/MIT, recourse should be made to the
Internet for putting results on developments in this area and information on cooperative
efforts in the public domain for the benefit of all concerned. This may promote evolution
of common networking languages.
With the interest and needs developing in this area among SAARC
countries, it was felt that such conferences should be held every year by rotation, among
SAARC nations, which may also review progress on recommendations.
An annual SAARC M/MIT award may be instituted in the region for
recognizing commendable performance in this area

Around this time, in 1987, Prof RMK Sinha
and his team at IIT Kanpur were working on a multilingual computing project called as GIST
(Graphics Intelligence-based Script Technology). The project was transferred to CDAC, a
DoE institution based in Pune. Headed by Mohan Tambe, CDAC took the responsibility for the
development of MLC technology. The GIST technology was transferred to about 10 companies
like HCL, Wipro, DCM and others. CDAC’s efforts towards standardization rests on three
important components-the ISCII standard (like ASCII), the Inscript keyboard and phonetic
typing, and the ISFOC font standard. Apart from the development of these standards, CDAC
also brought out application products like word-processors, DTP software, video
subtitling, character generator, other video applications, and it also developed a
standard for multilingual paging. CDAC’s mission was development and then
commercialization. According to SS Pujari, GIST Group Coordinator, “We have between
50% to 100% market share for various multilingual computing products in the country”.
Adds CB Raje, Business Group Coordinator for GIST, “We also spread MLC technology by
organizing training programs throughout the country.”

In the private sector, many new players
started emerging like Summit Data Products, SRG Systems, Aces Software and Sonata. There
have been primarily two principal markets for MLC-private sector publishing and the
government department applications. Companies like Modular and Sonata ruled the former
while CDAC prevailed in the latter. Another company, Cirrus Electronics whose strength was
marketing and distribution also emerged. Cirrus now has a network of 200 dealers in the
country and is the marketing arm for Modular.

CDAC too was a commercial success. However,
there is a feeling amongst private sector vendors that CDAC’s involvement in
commercialization is unwarranted. Says Dr MN Cooper, MD, Modular, “CDAC was
instituted to be a body developing technology and spreading it through partnerships. But
instead I find it to be functioning more like a marketing company.”

MLC Market in India: Issues
In terms of value, the MLC products market is just around Rs12 crore annually.
Though CDAC’s work has been pioneering and effective in spreading MLC technology in the
country, the contribution of the private sector is generally not appreciated. These
vendors have a vibrancy of their own despite the issues faced by them.

Developing the need-of-the-hour products
with meager resources and marketing them has been the topmost challenge. Cirrus has four
MLC application products on the anvil. Modular has important applications in
text-to-speech and speech-to-text area. Phonetic typing and user friendliness is on the
rise. According to Ajay Agarwal, MD, Seacom Solutions, “Our product Sulipi provides a
truly phonetic solution using the ISCII standard. Sulipi can also be used to do
programming in local script since it can sit on any programming platform.”

Effective Encoding Of
Indian Scripts
The need of the hour is to
evolve a common protocol for font-enabling of Indian scripts. Till now, most of the
font-enabling work done in the country has been of an individualistic nature, devoid of
any common standard. This has not only lead to repetition of work, but has resulted in
non-portability problems. Some in the industry believe that this could lead to uprooting
of the Indian language users in the future.

For this to happen, it will be necessary to adopt a font encoding
technique which does not require any software interpretation, and supports features
available in the off-the-shelf software. Universality of its application, and its ability
to scale up and easily adopt future technological advancements should form the core of the
common protocol.

Font encoding is nothing but to accommodate
all the possible glyphs of a language to cater to all the possible requirement. Currently,
any font-enabled software available in the country doesn’t meet this criteria to the
fullest. As a result, most of the fonts available now can only be used for document
processing software like MS Word, and lacks universality of application. The main reason
behind this is that most of the softwares available are not based on the coding system
used in the OS, but address the issue at a superficial font-level.

Right now, there are two accepted font
encoding standards for Indian languages, namely, ISCII, devised and approved by DoE, and
Unicode, a currently in vogue international standard. But for the present, only few of the
many software available in the country comply to either of this standard. Further, there
exists different standards for various electronic technologies. For instance, in pagers,
the Indian standard is ISCLAP which is already being used by some of the paging service
providers. But as newer convergent technologies like the Web TV and set-top boxes enter
the Indian market there will be a dire need to develop a commonly accepted standard to
accommodate them under one umbrella.

The technologists of the country will have
to decide on common standards for font encoding. There is also another option, if they may
wish to choose-develop fonts based on globally proven technologies. For instance, the True
Type technology jointly developed by Microsoft and Apple. At least, Microsoft’s supremacy
on the PC platform may solve the serious issue of portability of various Indian
font-enabled software.

Lack of funding for product development is
a deterrent. According to Cooper, "We started the company with our provident fund
money and later ploughing back our thin profits. We do not have a DoE to fund us in crores
for every project." While Modular develops its own products, CDAC’s product
development partner is Men At Work, a software company based in Pune. Men At Work is
credited with the development of the successful LEAP family of products.

For the potential that exists for MLC
software, the distribution mechanism is highly inadequate. Again distribution networks
being costly to implement and run, specialist organizations like Cirrus Electronics have
emerged. According to Sanjiv Mehta, MD, Cirrus, "We work closely with Modular and
manage distribution. We used to distribute CDAC products also till such time there was no
overlap between them." Cirrus is planning to venture into MLC training also.

The efforts by the industry somehow seems
to be fragmented, with no clear vision or guidance. And this lack of focus hasn’t been
able to instigate a mass movement in spreading the usage of MLC technology. What more,
this is probably the main cause for the low penetration of computers and computerization
in the country. The National IT Task Force recommendations include the spread of MLC
computing in the country. But what do we find. The issue of software exports takes the
limelight in the recommendations and MLC takes the backseat.

With the spread of computers in newer
segments, MLC technology will have to bear the responsibility of promoting usage in these
segments. Through the work of government and private enterprises the importance and
possibilities of MLC technology has been realized. Development of potential technologies
like morphological analyzers and spell checkers, machine-aided translation systems,
man-machine interfaces, Internet tools amongst others are some of the thrust areas for
applications in the future.

EASWARDAS SATYAN,
DHAVAL VALIA
and ANUPA VASUDEVAN in Mumbai,
with YOGRAJ VARMA in New Delhi.

Leave a Reply

Your email address will not be published. Required fields are marked *