Heapsort
From Free net encyclopedia
Heapsort is one of the best general-purpose sort algorithms, a comparison sort and part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantages of worst-case O(n log n) runtime and being an in-place algorithm. Heapsort is not a stable sort.
Contents |
Overview
One simple way to sort a list of objects is to use a heap data structure. We add all of our objects into the heap, and the heap organizes the elements added to it in such a way that we can quickly extract either the largest value (in a max-heap) or the smallest value (in a min-heap). Moreover, because this operation preserves the heap's structure, we can extract the largest/smallest value over and over again until none remain. This gives us the elements in order.
In doing so, the only extra space required is that needed to store the heap. In order to achieve constant space overhead, we use a trick: we store a binary heap (or alternatively, a heap with more than two children) inside the part of the input array which has not yet been sorted. (The structure of this heap is described at Binary heap: Heap implementation.) Heapsort makes use of two standard heap operations: insertion and root deletion. Each time we delete (extract) the maximum, we place it in the last location of the array not yet occupied, and use the remaining prefix of the array as a heap holding the remaining unsorted elements:
Heap of remaining unsorted elements | Sorted elements |
Variations
Although not widely known, it is possible to define a ternary heapsort which uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. Ternary heapsort is somewhat more complicated to program, but it is potentially faster. Each step in the sift operation of a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. The ternary heap can do two steps in less time than the binary heap requires for three steps. But two steps of a ternary tree multiply the index by a factor of 9, which is more than the factor 8 of three binary steps. Ternary heapsort is about 12% faster than binary heapsort.
Comparison with other sorts
Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.
Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort for a detailed discussion of this problem, and possible solutions.
The quicksort algorithm also requires Ω(log n) extra storage space, making it not a strictly in-place algorithm. This typically does not pose a problem except on the smallest embedded systems, or on systems where memory allocation is highly restricted. Constant space (in-place) variants of quicksort are possible to construct, but are rarely used in practice due to their extra complexity.
Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort.
Heapsort also competes with merge sort, which has the same time bounds, but requires Ω(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:
- Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order.
- Merge sort is simpler to understand than heapsort.
- Merge sort is a stable sort.
- Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all.
- Merge sort can be easily adapted to operate on linked lists and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times.
Implementation in pseudocode
The following is one way to implement the algorithm, in pseudocode, where swap is used to swap two elements of the array. Notice that the arrays are zero-based in this example.
function heapSort(a, count) { var int start := count ÷ 2 - 1, end := count - 1 while start ≥ 0 sift(a, start, count) start := start - 1 while end > 0 swap(a[end], a[0]) sift(a, 0, end) end := end - 1 } function sift(a, start, count) { var int root := start, child while root * 2 + 1 < count { child := root * 2 + 1 if child < count - 1 and a[child] < a[child + 1] child := child + 1 if a[root] < a[child] swap(a[root], a[child]) root := child else return } }
Implementation in C
This is a fast implementation of heapsort in C, adapted from Numerical Recipes in C but designed to be slightly more readable and to index from 0.
void heapsort(int arr[], unsigned int N) { unsigned int n = N, i = n/2, parent, child; int t; for (;;) { /* Loops until arr is sorted */ if (i > 0) { /* First stage - Sorting the heap */ i--; /* Save its index to i */ t = arr[i]; /* Save parent value to t */ } else { /* Second stage - Extracting elements in-place */ n--; /* Make the new heap smaller */ if (n == 0) return; /* When the heap is empty, we are done */ t = arr[n]; /* Save last value (it will be overwritten) */ arr[n] = arr[0]; /* Save largest value at the end of arr */ } parent = i; /* We will start pushing down t from parent */ child = i*2 + 1; /* parent's left child */ /* Sift operation - pushing the value of t down the heap */ while (child < n) { if (child + 1 < n && arr[child + 1] > arr[child]) { child++; /* Choose the largest child */ } if (arr[child] > t) { /* If any child is bigger than the parent */ arr[parent] = arr[child]; /* Move the largest child up */ parent = child; /* Move parent pointer to this child */ child = parent*2 + 1; /* Find the next child */ } else { break; /* t's place is found */ } } arr[parent] = t; /* We save t in the heap */ } }
References
- J. W. J. Williams. Algorithm 232 - Heapsort, 1964, Communications of the ACM 7(6): 347–348.
- Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 144–155 of section 5.2.3: Sorting by Selection.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0262032937. Chapter 6: Heapsort, pp.123–144.
External links
- Heapsort animated
- NIST's Dictionary of Algorithms and Data Structures: Heapsort
- Sorting revisited
- Heapsort Animation
- Heapsort Lecturecs:Heapsort
de:Heapsort es:Heapsort fr:Tri par tas lt:Krūvos rūšiavimo algoritmas lb:Heapsort nl:Heapsort ja:ヒープソート pl:Sortowanie przez kopcowanie pt:Heapsort zh:堆排序