Advertisment

Web In Your Language

author-image
DQI Bureau
New Update

Multilingualism on the

web is about creating and maintain-ing versions of a document in multiple languages on the

web. There could be a master document with translations, or several language originals to

be aligned or compared to get the result. When a document is available in a variety of

languages and the translations are more or less aligned, it is possible to create links

automatically between the different versions, or multi-headed links that target all of the

versions at the same time.

Advertisment

The main hindrances to

achieving multilingualism on web are the poor standards and protocols, when it comes to

multilingual text representation and rendering. Of late, newer standards and protocols

have been proposed and implemented for the purpose of web-based multilingualism. The newer

version of HTTP (version 1.1) has several new header fields, especially to aid language

negotiation, and, thereby, to achieve multilingualism on web. HTML has also been modified

to include tags, which allows one to specify language and direction attributes of a

particular section of the text. Technologies like OpenType fonts from Microsoft, TrueDoc

from Bitstream, and the Web Embedding Font Tools (WEFT) have also come into being which

help in negotiation and dynamic rendering of web documents on the internet.

The increasing

dominance of Unicode over the traditional ASCII code is also playing an important role in

this revolution. The Unicode standard uses a 16-bit code which can accommodate 6,500

characters as against the 7-bit ASCII code with just 128 alphanumeric characters. This

makes it possible to include multilingual alphanumeric characters. However, Unicode has

some limitations when it comes to internet communications. For instance, while Unicode

permits the viewing of Kanji script, it doesn't identify an email address in Arabic. And

for applications other than documenting, Unicode requires specific software interface,

which again adds to the complexity of using the code on various platforms.

But definitely,

Unicode has greatly contributed in expanding the scope of multilingualism on the web,

though at the expense of the communications bandwidth. The overheads associated with

Unicoding are much more than ASCII. Similar problem exists with the different font

enablement technologies. For instance, an embedded multilingual word document using MS

OpenType Font and WEFT of size 40k has 39K of overhead bits. Moreover, the currently

available multilingual font embedding technologies have inherent disadvantages like

security risks, disk usage space and IPR-related issues. Again is the always this issue of

non-portability or limited portability of the currently available technologies. Like MS's

OpenType fonts work only on Windows platform and is supported by IE only. On the other

hand, the TrueDoc fonts can be ported on Windows, Unix and Mac but lacks wide browser

support. It can only support Netscape.

In such a scenario,

Java, many feel, is the best thing to have happened and offers a very dynamic solution for

web-enabled multilingualism. Its universal portability makes it ideal for such

applications. Also, since Java is inherently secure, applets written in the language do

not pose any security threats to the client system. Just like a Java Virtual Machine, it

is easy to devise a program in Java that can be sent across the network to the client side

thus solving the problem of portability. And with the efficient code Java is, these

applets are also less taxing on the bandwidth.

There is package for

printing text in Indian language scripts called ITRANS. Called Indian Language

Transliterator, it does the transliteration mapping and the fonts may be developed

elsewhere. The input is in a transliterated form with each letter in an Indian language

assigned an English equivalent and one uses the English forms to construct what will

eventually print out in the Indian language script.

Advertisment