Multilingualism on the
web is about creating and maintain-ing versions of a document in multiple languages on the
web. There could be a master document with translations, or several language originals to
be aligned or compared to get the result. When a document is available in a variety of
languages and the translations are more or less aligned, it is possible to create links
automatically between the different versions, or multi-headed links that target all of the
versions at the same time.
The main hindrances to
achieving multilingualism on web are the poor standards and protocols, when it comes to
multilingual text representation and rendering. Of late, newer standards and protocols
have been proposed and implemented for the purpose of web-based multilingualism. The newer
version of HTTP (version 1.1) has several new header fields, especially to aid language
negotiation, and, thereby, to achieve multilingualism on web. HTML has also been modified
to include tags, which allows one to specify language and direction attributes of a
particular section of the text. Technologies like OpenType fonts from Microsoft, TrueDoc
from Bitstream, and the Web Embedding Font Tools (WEFT) have also come into being which
help in negotiation and dynamic rendering of web documents on the internet.
The increasing
dominance of Unicode over the traditional ASCII code is also playing an important role in
this revolution. The Unicode standard uses a 16-bit code which can accommodate 6,500
characters as against the 7-bit ASCII code with just 128 alphanumeric characters. This
makes it possible to include multilingual alphanumeric characters. However, Unicode has
some limitations when it comes to internet communications. For instance, while Unicode
permits the viewing of Kanji script, it doesn't identify an email address in Arabic. And
for applications other than documenting, Unicode requires specific software interface,
which again adds to the complexity of using the code on various platforms.
But definitely,
Unicode has greatly contributed in expanding the scope of multilingualism on the web,
though at the expense of the communications bandwidth. The overheads associated with
Unicoding are much more than ASCII. Similar problem exists with the different font
enablement technologies. For instance, an embedded multilingual word document using MS
OpenType Font and WEFT of size 40k has 39K of overhead bits. Moreover, the currently
available multilingual font embedding technologies have inherent disadvantages like
security risks, disk usage space and IPR-related issues. Again is the always this issue of
non-portability or limited portability of the currently available technologies. Like MS's
OpenType fonts work only on Windows platform and is supported by IE only. On the other
hand, the TrueDoc fonts can be ported on Windows, Unix and Mac but lacks wide browser
support. It can only support Netscape.
In such a scenario,
Java, many feel, is the best thing to have happened and offers a very dynamic solution for
web-enabled multilingualism. Its universal portability makes it ideal for such
applications. Also, since Java is inherently secure, applets written in the language do
not pose any security threats to the client system. Just like a Java Virtual Machine, it
is easy to devise a program in Java that can be sent across the network to the client side
thus solving the problem of portability. And with the efficient code Java is, these
applets are also less taxing on the bandwidth.
There is package for
printing text in Indian language scripts called ITRANS. Called Indian Language
Transliterator, it does the transliteration mapping and the fonts may be developed
elsewhere. The input is in a transliterated form with each letter in an Indian language
assigned an English equivalent and one uses the English forms to construct what will
eventually print out in the Indian language script.