Misplaced Pages

Cache-oblivious distribution sort

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Comparison-based sorting algorithm
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources.
Find sources: "Cache-oblivious distribution sort" – news · newspapers · books · scholar · JSTOR (May 2014) (Learn how and when to remove this message)
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (May 2014) (Learn how and when to remove this message)
(Learn how and when to remove this message)

The cache-oblivious distribution sort is a comparison-based sorting algorithm. It is similar to quicksort, but it is a cache-oblivious algorithm, designed for a setting where the number of elements to sort is too large to fit in a cache where operations are done. In the external memory model, the number of memory transfers it needs to perform a sort of N {\displaystyle N} items on a machine with cache of size Z {\displaystyle Z} and cache lines of length L {\displaystyle L} is O ( N L log Z N ) {\displaystyle O({\frac {N}{L}}\log _{Z}N)} , under the tall cache assumption that Z = Ω ( L 2 ) {\displaystyle Z=\Omega (L^{2})} . This number of memory transfers has been shown to be asymptotically optimal for comparison sorts. This distribution sort also achieves the asymptotically optimal runtime complexity of Θ ( N log N ) {\displaystyle \Theta (N\log N)} .

Algorithm

Overview

Distribution sort operates on a contiguous array of N {\displaystyle N} elements. To sort the elements, it performs the following:

  1. Partition the array into N {\displaystyle {\sqrt {N}}} contiguous subarrays of size N {\displaystyle {\sqrt {N}}} , and recursively sort each subarray.
  2. Distribute the elements of the sorted subarrays into q N {\displaystyle q\leq {\sqrt {N}}} buckets B 1 , B 2 , , B q {\displaystyle B_{1},B_{2},\ldots ,B_{q}} each of size at most 2 N {\displaystyle 2{\sqrt {N}}} such that for every i from 1 to q-1, every element of bucket B i {\displaystyle B_{i}} is not larger than any element in B i + 1 . {\displaystyle B_{i+1}.} This distribution step is the main step of this algorithm, and is covered in more detail below.
  3. Recursively sort each bucket.
  4. Output the concatenation of the buckets.


Distribution step

As mentioned in step 2 above, the goal of the distribution step is to distribute the sorted subarrays into q buckets B 1 , B 2 , , B q . {\displaystyle B_{1},B_{2},\ldots ,B_{q}.} The distribution step algorithm maintains two invariants. The first is that each bucket has size at most 2 N {\displaystyle 2{\sqrt {N}}} at any time, and any element in bucket B i {\displaystyle B_{i}} is no larger than any element in bucket B i + 1 . {\displaystyle B_{i+1}.} The second is that every bucket has an associated pivot, a value which is greater than all elements in the bucket.

Initially, the algorithm starts with one empty bucket with pivot {\displaystyle \infty } . As it fills buckets, it creates new buckets by splitting a bucket into two when it would be made overfull (by having at least ( 2 N + 1 ) {\displaystyle (2{\sqrt {N}}+1)} elements placed into it). The split is done by performing the linear time median finding algorithm, and partitioning based on this median. The pivot of the lower bucket will be set to the median found, and the pivot of the higher bucket will be set to the same as the bucket before the split. At the end of the distribution step, all elements are in the buckets, and the two invariants will still hold.

To accomplish this, each subarray and bucket will have a state associated with it. The state of a subarray consists of an index next of the next element to be read from the subarray, and a bucket number bnum indicating which bucket index the element should be copied to. By convention, b n u m = {\displaystyle bnum=\infty } if all elements in the subarray have been distributed. (Note that when we split a bucket, we have to increment all bnum values of all subarrays whose bnum value is greater than the index of the bucket that is split.) The state of a bucket consists of the value of the bucket's pivot, and the number of elements currently in the bucket.

Consider the follow basic strategy: iterate through each subarray, attempting to copy over its element at position next. If the element is smaller than the pivot of bucket bnum, then place it in that bucket, possibly incurring a bucket split. Otherwise, increment bnum until a bucket whose pivot is large enough is found. Though this correctly distributes all elements, it does not exhibit a good cache performance.

Instead, the distribution step is performed in a recursive divide-and-conquer. The step will be performed as a call to the function distribute, which takes three parameters i, j, and m. distribute(i, j, m) will distribute elements from the i-th through (i+m-1)-th subarrays into buckets, starting from B j {\displaystyle B_{j}} . It requires as a precondition that each subarray r in the range i , , i + m 1 {\displaystyle i,\ldots ,i+m-1} has its b n u m [ r ] j {\displaystyle bnum\geq j} . The execution of distribute(i, j, m) will guarantee that each b n u m [ r ] j + m {\displaystyle bnum\geq j+m} . The whole distribution step is distribute ( 1 , 1 , N ) {\displaystyle (1,1,{\sqrt {N}})} . Pseudocode for the implementation of distribute is shown below:

def distribute(i, j, m: int) -> None:
    """Distribute through recursive divide-and-conquer."""
    if m == 1:
        copy_elems(i, j)
    else:
        distribute(i, j, m / 2)
        distribute(i + m / 2, j, m / 2)
        distribute(i, j + m / 2, m / 2)
        distribute(i + m / 2, j + m / 2, m / 2)

The base case, where m=1, has a call to the subroutine copy_elems. In this base case, all elements from subarray i that belong to bucket j are added at once. If this leads to bucket j having too many elements, it splits the bucket with the procedure described beforehand.

See also

References

Categories:
Cache-oblivious distribution sort Add topic