BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR97-320 ENTRY:: August 27, 1997 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: On-Line File Caching TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Young, Neal E. DATE:: July 1997 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: Compressed Postscript at http://www.cs.dartmouth.edu/reports/TR97-320.ps.Z RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR97-320.pdf ABSTRACT:: Consider the following file caching problem: in response to a sequence of requests for files, where each file has a specified size and retrieval cost, maintain a cache of files of total size at most some specified k so as to minimize the total retrieval cost. Specifically, when a requested file is not in the cache, bring it into the cache, pay the retrieval cost, and choose files to remove from the cache so that the total size of files in the cache is at most k. This problem generalizes previous paging and caching problems by allowing objects of arbitrary size and cost, both important attributes when caching files for world-wide-web browsers, servers, and proxies. We give a simple deterministic on-line algorithm that generalizes many well-known paging and weighted-caching strategies, including least-recently-used, first-in-first-out, flush-when-full, and the balance algorithm. On any request sequence, the total cost incurred by the algorithm is at most k/(k-h+1) times the minimum possible using a cache of size h <= k. For any algorithm satisfying the latter bound, we show it is also the case that for most choices of k, the retrieval cost is either insignificant or the competitive ratio is constant. This helps explain why competitive ratios of many on-line paging algorithms have been typically observed to be constant in practice. END:: ncstrl.dartmouthcs//TR97-320