Heap sort is a comparison-based sorting technique based on Binary Heap data structure.
Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to more quickly find the largest element in each step.
What is Binary Heap?
A Binary Heap is a Complete Binary Tree where items are stored in a special order such that value in a parent node is greater in case of Max-Heap or smaller in case of Min-Heap than the values in its two children’s nodes.
The above image will let us understand the difference. On the left is a max-heap as the root node 12 is greater than or equal to than the value in it’s children nodes. Looking at the left subtree containing the node 10, 10 is greater than/equal to 5 & 6 therefore is a max-heap.
Similarly, on the right side is a min-heap as the root node 1 is less than or equal to than the value in it’s children nodes. Looking at the left subtree containing the node 5, 5 is less than/equal to 10 & 6 therefore is a min-heap.
Why array based representation for Binary Heap?
Since a Binary Heap is a Complete Binary Tree, it can be easily represented as an array and the array-based representation is space-efficient. If the parent node is stored at index I, the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2 (assuming the indexing starts at 0).
The 1st major step in heap sort involves converting the given tree into a max-heap to sort the elements in ascending order or in a min-heap to sort the elements in descending order.
A Heap must satisfy the heap-order property i.e., the value stored at each node is greater than or equal to its children.
Let’s use the above image to see how the tree is heapified.
The root node 2 is less than it’s children nodes and does not satisfy the max-heap property. So, we replace the root node with max. of the children nodes i.e. maximum if 10 & 9 is 10, so we replace 2 with 10. After the replacement, we see that the left subtree of the root node has 2 as parent node which is less than the value in it’s children nodes. Therefore we replace 2 with the max. of 5 & 6 i.e. 6. Now, the tree has become a heap.
After heapify-ing, the next step involves extracting the largest/smallest element from the root and replacing it with the last node followed by discarding the new last node until the tree becomes empty.
How Heap Sort Works?
- Since the tree satisfies Max-Heap property, then the largest item is stored at the root node.
- Swap: Remove the root element and put at the end of the array (nth position) Put the last item of the tree (heap) at the vacant place.
- Remove: Reduce the size of the heap by 1.
- Heapify: Heapify the root element again so that we have the highest element at root.
- The process is repeated until all the items of the list are sorted.
Let us consider the max heap above.
Firstly, the root node is replaced with the last node.
And now, we discard the last node.
Now the tree loses the heap property, therefore, we heapify again until the tree regains the heap property.
Now, we can see that the root node does not satisfy the max-heap property as it is not greater than or equal to it’s children nodes. So, we swap the root node 11 with the greater of the children nodes i.e. 22.
After the swap, we see that the left subtree of the root node does not satisfy the max-heap property as 11 is not greater than 19 or 22, therefore we replace 11 with 22.
Again after the swap, going deep down the tree, we see that 11 does not satisfy the max-heap property as 11 is not greater than 21, therefore, we swap 11 & 21 which leaves us with a max-heap.
The tree now has finally regained the max-heap property.
To see how heap sort works in real time, and how all the comparisons are made until the tree is empty, have a look at this online tool.
Where to use heap sort?
- Guaranteed O(nlogn) performance. When you don’t necessarily need very fast performance, but guaranteed O(nlogn) performance (e.g. in a game), because Quicksort’s O(n²) can be painfully slow. Why not use Mergesort then? Because it takes O(n) extra memory.
- To avoid Quicksort’s worst case. C++’s
std::sortroutine generally uses a varation of Quicksort called Introsort, which uses Heapsort to sort the current partition if the Quicksort recursion goes too deep, indicating that a worst case has occurred.
- Partially sorted array even if stopped abruptly. We get a partially sorted array if Heapsort is somehow stopped abruptly. Might be useful, who knows?
Advantages of Heap Sort
- Efficiency. The Heap sort algorithm is very efficient. …
- Memory Usage. The Heap sort algorithm can be implemented as an in-place sorting algorithm. …
- Simplicity. The Heap sort algorithm is simpler to understand than other equally efficient sorting algorithms. …
Disadvantages of Heap Sort
- Relatively slow as compared to Quicksort
- Cache inefficient
- Not stable
- Not really adaptive (Doesn’t get faster if given somewhat sorted array)
Applications of Heap Sort
Heap sort algorithm has limited uses because Quicksort and Mergesort are better in practice. Nevertheless, the Heap data structure itself is enormously used. See Applications of Heap Data Structure
Understanding the driver code…
heapSort(int arr, int n) is the function which takes input an array and the size of the array.
The 1st call to the heapify function is to create a max heap on the original array.
Then we extract all the elements from the heap one by one and first swap the root node with the last node. After swapping, the heap loses the heap property. Therefore, we call heapify again so that the tree regains the heap property.
Time & Space Complexities
The time complexity of heap sort is : O(nlogn)
The space complexity of heap sort is : O(1)