BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR2005-554 ENTRY:: August 08, 2005 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Efficiently Implementing a Large Number of LL/SC Objects TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Jayanti, Prasad AUTHOR:: Petrovic, Srdjan DATE:: August 2005 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR2005-554.pdf ABSTRACT:: Over the past decade, a pair of instructions called load-linked (LL) and store-conditional (SC) have emerged as the most suitable synchronization instructions for the design of lock-free algorithms. However, current architectures do not support these instructions; instead, they support either CAS (e.g., UltraSPARC, Itanium) or restricted versions of LL/SC (e.g., POWER4, MIPS, Alpha). Thus, there is a gap between what algorithm designers want (namely, LL/SC) and what multiprocessors actually support (namely, CAS or RLL/RSC). To bridge this gap, a flurry of algorithms that implement LL/SC from CAS have appeared in the literature. The two most recent algorithms are due to Doherty, Herlihy, Luchangco, and Moir (2004) and Michael (2004). To implement M LL/SC objects shared by N processes, Doherty et al.'s algorithm uses only O(N + M) space, but is only non-blocking and not wait-free. Michael's algorithm, on the other hand, is wait-free, but uses O(N^2 + M) space. The main drawback of his algorithm is the time complexity of the SC operation: although the expected amortized running time of SC is only O(1), the worst-case running time of SC is O(N^2). The algorithm in this paper overcomes this drawback. Specifically, we design a wait-free algorithm that achieves a space complexity of O(N^2 + M), while still maintaining the O(1) worst-case running time for LL and SC operations. END:: ncstrl.dartmouthcs//TR2005-554