Department of Computer Science - Iowa State University

 

Projects

Strategies for Caching Information on Distributed Systems

Supported by NSF CAREER CCR-#0092914

Project Summary

Digital information has become an inseparable part of our daily life. This phenomenon prominent in the 90's is attributed by significant advances in high-speed internetworking technologies, affordable high performance computers and peripherals, and the World-Wide Web. The Web has evolved rapidly from a simple information-sharing mechanism offering only static text and images to a rich assortment of dynamic and interactive services such as video/audio conferencing, electronic commerce, and distance learning. The explosive growth of the Web has imposed a heavy demand on networking resources and Web servers. Users often experience long and unpredictable delays when retrieving Web pages from remote sites. Despite the rapid increase of the Internet backbone bandwidth, the same scenario will remain or worsen unless effective software solutions are also provided.

Web caching has been recognized as an effective technique that reduces service delays and wide-area-network (WAN) bandwidth consumption. The concept of Web caching is that Web pages deemed popular are replicated at locations close to the requesting users in the hope that these pages will be requested shortly by other users. Ideally, requests for the cached pages can be satisfied without accessing remote Web sites. Service delays and WAN bandwidth consumption can be reduced significantly since propagation delays, transmission delays, and processing delays at Web servers are eliminated for the cached pages. In practice, several factors diminish the ideal effectiveness of Web caching. The obvious limiting factors are finite system resources of cache servers (i.e., memory space, disk storage and I/O bandwidth, processing power, and networking resources). The less apparent factors include characteristics of Web usage.

This project will develop an integrated research and education program that investigates various effective and efficient strategies for caching information on distributed systems such as Web caching. The proposed research plan is a progression of ideas stemming from the principal investigator (PI)'s previous research contributions addressing related problems in distributed and parallel systems. The approach taken in this research is to first gain a deep understanding of important characteristics of Web usage. The PI will then explore various techniques that utilize the discovered characteristics to improve performance of cache servers serving users located in the same vicinity. For Web requests, the PI will investigate techniques that fully utilize the aggregate capabilities of the cache servers and balance loads among these servers. For video or audio requests, client machines will be employed to serve other clients through an initiation of a cache server. This approach enables the caching system to scale beyond the capabilities of the participating cache servers while mitigating the effect that caching videos or audio files can reduce the cache hit rate for Web pages. Once a high performance caching system for users in the same vicinity is obtained, the PI will investigate the cooperation of these systems in a geographically distributed area to further reduce the service delays and the demand on WAN bandwidth. In preparation for this proposal, the PI has done a preliminary study of another unexplored characteristic of Web usage, the use of images, since they constitute a large portion of the Internet traffic. This study forms the basis for several investigations throughout the course of this research.

The research part of the program will provide (i) an understanding of other unexplored characteristics of Web usage that place a heavy demand on networking and server resources; (ii) performance models of these characteristics; (iii) load-balancing strategies that utilize the aggregate capabilities of cache servers and client machines; and (iv) effective strategies for caching video and audio files. This research will have an impact on the evolution of the Web and could lead to a development of a better distributed information-sharing paradigm other than the Web.

Publications

The following material is based upon work supported by the National Science Foundation under Grant No. CCR 0092914. Any opinions, findings, and conclusions or recommendations expressed in this material are those of author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).
Link to publications in Multimedia Caching and Dissemination

Back to top