The growing popularity of the Internet is burdening the
already taxed bandwidth. Numerous endeavors to address the problem of Net
congestion have hit high ground. While solutions like increasing the available
bandwidth, using digital subscriber lines or optical fiber networks aim at
improving the transfer of information over the Internet, the root cause of Net
congestion remains ignored.
The repeated transfer of frequently asked information
constitutes the bulk of Internet traffic. This being the case, any amount of
bandwidth would be insufficient. Installing faster circuits, routers or switches
can reduce congestion, but will not reduce the round-trip time between two
nodes. Trans-oceanic Internet links have round-trip delays in the range of 100–300
milliseconds. The speed of light also imposes fundamental limits on network
delays. When the transmission medium is fiber, data can travel at around 60% of
the speed of light, just about 179,876 kilometers per second. Thus, with no
workable solution in sight, a new technology–Web caching–is catching the
attention of all.
What’s Web caching?
Caching in general terms refers to a technology that speeds
up access to data by storing frequently requested data nearby. Caching is used
in CPUs for accessing data to and from memory. The technique is in use in
operating systems and satellite communications. In Web caching, the technology
is used for sending high-demand content from a server to an ISP’s cache
through satellites. The data that is needed most often is stored close to where
it is needed, thus avoiding the repeated transfer of data over long stretches of
network. To put the technique simply, it is similar to having a pile of folders
on your desk to save you umpteen trips to the filing cabinet across the room.
Web caching can hence be the easiest way to ease Net congestion.
How does it work?
When a user makes a request, it is first processed by the ISP’s
cache. The cache checks for a valid reply for the request. If it finds a reply,
it performs a second check. This time, the freshness of the information is
verified. In case, the information is fresh, it is put before the user. In case
the information is unavailable or stale, the cache makes a request on behalf of
the user. This request is routed through the satellite to the cache service’s
ground station. Fresh updated information is transmitted back to the ISP’s
cache, which then passes it on to the user.
Apart from making available the most recent versions of Web
pages, an ISP’s cache needs to be able to communicate in various protocols
like FTP, Gopher and HTTP. When the user makes an FTP request, the cache should
be able to use the FTP protocol when requesting the file from the FTP server and
conversely should be able to translate the FTP reply into an HTTP one for the
user.
A cache is more than just a local storage department. When
the cache is full, existing objects must be removed. Selecting the objects to be
removed is based on various replacement algorithms.
Speedier access
There are advantages of caching–bandwidth consumption,
server load and latency get reduced. The greatest advantage of Web caching is
the reduction in the backbone network traffic–it can reduce the demand for
bandwidth by at least 35%, and enable optimized use of the existing bandwidth.
Since the load on content servers will also be reduced, the number of users who
can reach the server’s documents without increased bandwidth will also rise,
without crashing the server. Storage of data at local ISPs will mean shorter
distances for data to travel, and thereby reduced latency. A cache can also
isolate end-users from network failures.
Big gainers: Multimedia and e-com
Caching is best suited for static pages and huge files that
need to be downloaded often. Multimedia files such as movie trailers or music
tracks will benefit most from the caching technology. In addition to being of
large sizes, these files are also prone to jitters or random variations in the
delivery of individual packets. Since the distance between the user and the
information is shortened by the use of caching, such distortions can be
eliminated.
Caching can also aid e-commerce. According to a report by
Zona Research, as much as $4.4 billion per year in potential B2C revenues might
be lost because of slow download speeds. Faced with this, shoppers migrate to
other Web sites. Web caching can help e-commerce sites retain shoppers by
caching portions of their Web pages. Static pages such as home pages and fixed
elements like the company logo, copyright information and navigational buttons
usually remain unchanged. Caching these portions would require only the new
elements of a page to be transmitted.
The flip side–not too bad
Caching, as of today, has some limitations. The biggest issue
is the probability of storing stale information. How recent can be the pages in
the local cache? How often can these caches refresh data? These form the basis
of any discussion on the implementation of caches.
Existing HTTP servers are incapable of informing caches about
updated objects. However, new standards like HTTP 1.1 include features that
allow users to specify freshness parameters and allow page authors to decide
which parts of a page should be cached. HTTP 1.1 has included the Cache-Control
header that is an improvement over the earlier Expires header of the HTTP
standard.
HTTP 1.1 has introduced the concept of active caching, which
effectively addresses the freshness issue. Rather than wait for page requests to
check for a Web object’s freshness, active caching determines and ‘pre-fetches’
objects that are likely to go stale. Active caching works on algorithms based on
factors such as the frequency with which the object has been requested, the
frequency at which it has undergone changes and the bandwidth cost of retrieving
the object.
Also, caching does imply some undesirable ramifications. One
is the possibility of undetected content modification and violation of
copyrights. With increasing cyber crimes, there are concerns that caches could
become targets for hacking. The skepticism of content alteration without the
permission of the content provider also deters the implementation of caching.
Moreover, the importance of access counts will be lost with widespread
acceptance of Web caching. This will take away from the content providers a
vital tool to judge the performance of any site. Additionally, operation of a
cache requires new equipment and personnel.
The benefits of caching, however, will be a strong incentive
to resist. In India, Web caching can considerably ease the pressure on the
existing bandwidth, even as an expedited effort is made to meet Nasscom’s
projected bandwidth requirement of 300GB by 2005.
Priya Sivakumaran
In New Delhi