Quicksort Partitioning: Hoare vs. Lomuto

Question

There are two quicksort partition methods mentioned in Cormen:

(the argument A is the array, and [p, r] is the range, inclusive, to perform the partition on. The returned value is the index to the pivot after the partition.)

Hoare-Partition(A, p, r)
x = A[p]
i = p - 1
j = r + 1
while true
    repeat
        j = j - 1
    until A[j] <= x
    repeat
        i = i + 1
    until A[i] >= x
    if i < j
        swap( A[i], A[j] )
    else
        return j

and:

Lomuto-Partition(A, p, r)
x = A[r]
i = p - 1
for j = p to r - 1
    if A[j] <= x
        i = i + 1
        swap( A[i], A[j] )
swap( A[i + 1], A[r] )
return i + 1

Disregarding the method of choosing the pivot, in what situations is one preferable to the other? I know for instance that Lomuto preforms relatively poorly when there is a high percentage of duplicate values ( i.e. where say more than 2/3rds the array is the same value ), whereas Hoare performs just fine in that situation.

What other special cases make one partition method significant better than the other?

I can't think of any situation in which Lomuto is better than Hoare. It seems like Lomuto performs extra swaps whenever A[i+1] <= x. In a sorted array (and given reasonably chosen pivots) Hoare does almost no swaps and Lomuto does a ton (once j gets small enough then all the A[j] <= x.) What am I missing? — Wandering Logic, Apr 21 '13 at 13:05
@WanderingLogic I'm not sure, but it seems Cormen's decision to use the Lomuto partition in his book may be pedagogical - it seems to have a fairly straight-forward loop invariant. — Robert S. Barnes, Apr 21 '13 at 17:05
Note that those two algorithms don't do the same thing. At the end of Hoare's algorithm, the pivot is not at it's final place. You could add a swap(A[p], A[j]) at the end of Hoare's to get the same behaviour for both. — Mmmh mmh, Oct 15 '14 at 12:32
You should also check for i < j in the 2 repeat loops of Hoare's partitioning. — Mmmh mmh, Oct 15 '14 at 12:33
Adding swap(A[p], A[j]) is not sufficient, since after the first iteration of the loop A[p] is swapped with some A[j] <= A[p] ( i = p - 1, i = i + 1 thus i = p and since A[p] >= A[p] is always true). — Mmmh mmh, Oct 16 '14 at 06:23
@AurélienOoms These checkes i<j are not necessary. See also Knuth's paper, example 8a. — Yai0Phah, Jul 10 '19 at 12:38
I just replaced Lomuto with the original Hoare in the TXR Lisp sort function. I'm seeing a 21% speedup on sorting a millino pseudo-random integers in the range [0, 1M). (Same pivot selection: median of three). It's almost neck-and-neck for a sorted list. Also, I can't believe I didn't even know; the original function behaves degenerately for a sequence of repeated values; the time blows up quadratic. Hoare handles it flawlessly. I now consider Lomuto to be vandalism of Hoare's work. It shoulld not be taught, and never mentioned in a text without noting Hoare's original scheme. — Kaz, May 02 '23 at 05:46
Lomuto being easier to remembe and implement closed-book is a poor reason to promote it. We should aim higher: get the correct references, and struggle a ltitle bit to adapt to the conventions of the language we are using. It's a chance to use some discipline. I wrote an exhaustive test case which tested all permutations of a list; that flushed out the edge cases being wrong. — Kaz, May 02 '23 at 05:51
Imagine you came up with a great algorithm that everyone uses, but they forgot your version and instead adopted some crock variant that gives it double-digit slowdowns compared to your original What a slap in the face! — Kaz, May 02 '23 at 06:04

Sebastian · Accepted Answer · 2022-03-22T00:15:03.873

Pedagogical Dimension

Due to its simplicity Lomuto's partitioning method might be easier to implement. There is a nice anecdote in Jon Bentley's Programming Pearl on Sorting:

“Most discussions of Quicksort use a partitioning scheme based on two approaching indices [...] [i.e. Hoare's]. Although the basic idea of that scheme is straightforward, I have always found the details tricky - I once spent the better part of two days chasing down a bug hiding in a short partitioning loop. A reader of a preliminary draft complained that the standard two-index method is in fact simpler than Lomuto's and sketched some code to make his point; I stopped looking after I found two bugs.”

Performance Dimension

For practical use, ease of implementation might be sacrificed for the sake of efficiency. On a theoretical basis, we can determine the number of element comparisons and swaps to compare performance. Additionally, actual running time will be influenced by other factors, such as caching performance and branch mispredictions.

As shown below, the algorithms behave very similar on random permutations except for the number of swaps. There Lomuto needs thrice as many as Hoare!

Number of Comparisons

Both methods can be implemented using $n-1$ comparisons to partition an array of length $n$. This is essentially optimal, since we need to compare every element to the pivot for deciding where to put it.

Number of Swaps

The number of swaps is random for both algorithms, depending on the elements in the array. If we assume random permutations, i.e. all elements are distinct and every permutation of the elements is equally likely, we can analyze the expected number of swaps.

As only relative order counts, we assume that the elements are the numbers $1,\ldots,n$. That makes the discussion below easier since the rank of an element and its value coincide.

Lomuto's Method

The index variable $j$ scans the whole array and whenever we find an element $A[j]$ smaller than pivot $x$, we do a swap. Among the elements $1,\ldots,n$, exactly $x-1$ ones are smaller than $x$, so we get $x-1$ swaps if the pivot is $x$.

The overall expectation then results by averaging over all pivots. Each value in $\{1,\ldots,n\}$ is equally likely to become pivot (namely with prob. $\frac1n$), so we have

$$ \frac1n \sum_{x=1}^n (x-1) = \frac n2 - \frac12\;. $$

swaps on average to partition an array of length $n$ with Lomuto's method.

Hoare's Method

Here, the analysis is slightly more tricky: Even fixing pivot $x$, the number of swaps remains random.

More precisely: The indices $i$ and $j$ run towards each other until they cross, which always happens at $x$ (by correctness of Hoare's partitioning algorithm!). This effectively divides the array into two parts: A left part which is scanned by $i$ and a right part scanned by $j$.

Now, a swap is done exactly for every pair of “misplaced” elements, i.e. a large element (larger than $x$, thus belonging in the right partition) which is currently located in the left part and a small element located in the right part. Note that this pair forming always works out, i.e. there the number of small elements initially in the right part equals the number of large elements in the left part.

One can show that the number of these pairs is hypergeometrically $\mathrm{Hyp}(n-1,n-x,x-1)$ distributed: For the $n-x$ large elements we randomly draw their positions in the array and have $x-1$ positions in the left part. Accordingly, the expected number of pairs is $(n-x)(x-1)/(n-1)$ given that the pivot is $x$.

Finally, we average again over all pivot values to obtain the overall expected number of swaps for Hoare's partitioning:

$$ \frac1n \sum_{x=1}^n \frac{(n-x)(x-1)}{n-1} = \frac n6 - \frac13\;. $$

(A more detailed description can be found in my master's thesis, page 29.)

Memory Access Pattern

Both algorithms use two pointers into the array that scan it sequentially. Therefore both behave almost optimal w.r.t. caching.

Equal Elements and Already Sorted Lists

As already mentioned by Wandering Logic, the performance of the algorithms differs more drastically for lists that are not random permutations.

On an array that is already sorted, Hoare's method never swaps, as there are no misplaced pairs (see above), whereas Lomuto's method still does its roughly $n/2$ swaps!

The presence of equal elements requires special care in Quicksort. (I stepped into this trap myself; see my master's thesis, page 36, for a “Tale on Premature Optimization”) Consider as extreme example an array which filled with $0$s. On such an array, Hoare's method performs a swap for every pair of elements - which is the worst case for Hoare's partitioning - but $i$ and $j$ always meet in the middle of the array. Thus, we have optimal partitioning and the total running time remains in $\mathcal O(n\log n)$.

Lomuto's method behaves much more stupidly on the all $0$ array: The comparison A[j] <= x will always be true, so we do a swap for every single element! But even worse: After the loop, we always have $i=n$, so we observe the worst case partitioning, making the overall performance degrade to $\Theta(n^2)$!

Conclusion

Lomuto's method is simple and easier to implement, but should not be used for implementing a library sorting method.

Clarification

In this answer, I explained why a good implementation of the “crossing-pointer scheme” from Hoare's partitioning method is superior to the simpler scheme of Lomuto's method, and I stand by everything I said on that topic.
Alas, this is strictly speaking not what the OP was asking!

The pseudocode for Hoare-Partition as given above does not have the desirable properties I lengthily praised, since it fails to exclude the pivot element from the partitioning range. As a consequence, the pivot is “lost” in the swapping and cannot be put into its final position after partitioning, and hence be excluded it from recursive calls. (That means the recursive calls do no longer fulfill the same randomness assumptions and the whole analysis seems to break down! Robert Sedgewick's PhD dissertation discusses this issue in detail.)

For pseudocode of the desirable implementation analyzed above, see my master's thesis, Algorithm 1.(That code is due to Robert Sedgewick).

I would make a small clarification, that as the ratio of unique elements to total elements gets lower, the number of comparisons that Lomuto does grows significantly faster than those of Hoare. This is likely due to poor partitioning on Lomuto's part and good average partitioning on Hoare's part. — Robert S. Barnes, May 13 '13 at 06:37
You can easily create a variant of Lomuto method that can extract all elements that are equal to the pivot, and leave them out of recursion, though I am not sure if it would help or hinder average case. — Jakub Narębski, Oct 19 '19 at 22:06
@JakubNarębski That's true, but it is not clear if that is still "Lomuto's method" then ... I think it is fair to say that Hoare-Sedgewick partitioning has an edge here. — Sebastian, Feb 16 '20 at 17:58
Where does "Hoare's requires three times fewer swaps than Lomuto's" come from? Is there a way to calculate that figure, or is it empirical? If it's empirical, can you elaborate on the observations that led to it? — jszaday, Apr 08 '23 at 21:10
The answer contains the formulas for the average number of swaps for both Hoare's and Lomuto's methods; however, dividing them does not trivially simplify to three (i.e., # swaps Lomuto's / # swaps Hoare's). Unless there's a more robust calculation, I suppose that the answer is "drop the fractional terms, then (n / 2) / (n / 6) = 3." — jszaday, Apr 11 '23 at 01:43
Oh, the precise statement should have been "Hoare's requires asymptotically three times fewer swaps". If you divide the two expressions, the result is approaching 1/3 very quickly with increasing n: https://www.wolframalpha.com/input?i=plot+%28n%2F6-1%2F3%29+%2F+%28n%2F2-1%2F2%29+for+n+%3D+1+to+100 — Sebastian, Apr 12 '23 at 06:52
What does "easier to implement" mean? If we have a complete description of either algorithm, either one is just a coding exercise. Hoare's two pointer approach could be more difficult to implement if someone is just given a sketchy description and asked to re-invent the details, or implement it "closed book" from memory. Nobody should be doing that, though, you would think. — Kaz, May 01 '23 at 19:54
@Kaz you would think, alas ... More seriously, depending on the language/setup, there might be minor tweaks to the method that seem more idiomatic or more convenient to code; in Hoare-Sedgewick partitioning, you have to very careful with these, whereas Lumoto's method seems more forgiving. It clearly is a soft argument, but one that several people working on sorting implementations have painfully rediscovered, so I consider it a fair point. — Sebastian, May 04 '23 at 11:02
I did work like this recently. I wrote a test case which validates the sorting of every possible 9 element sequence, plus cases like empty, two elements, and a few large test cases: in order, reverse order, repeating element, random. With those, I was able to massage the Hoare partitioning into working almost without having to think. There can be differences like whether your arrays are 1-based or 0-based, and when the interval being sorted is closed or half-open. — Kaz, May 05 '23 at 22:03

score 8 · Answer 2 · answered Jan 25 '19 at 21:43

8

Some comments added to the excellent Sebastian answer.

I'm going to talk about the partition rearrangements algorithm in general and not about its particular use for Quicksort.

Stability

Lomuto's algorithm is semistable: the relative order of the elements not satisfying the predicate is preserved. Hoare's algorithm is unstable.

Element Access Pattern

Lomuto's algorithm can be used with singly linked list or similar forward-only data structures. Hoare's algorithm needs bidirectionality.

Number of Comparisons

Lomuto's algorithm can be implemented performing $n-1$ applications of the predicate to partition a sequence of length $n$. (Hoare's too).

But in order to do this we have to sacrifice 2 properties:

The sequence to be partitioned must not be empty.
The algorithm is unable to return the partition point.

If we need any of these 2 properties, we will have no choice but to implement the algorithm by making $n$ comparisons.

answered Jan 25 '19 at 21:43

Fernando Pelliccioni

181
1
3

Thanks Fernando. QQ: What exactly do you mean by "predicate" here? – Josh Apr 28 '20 at 15:52
@Josh A predicate is a functional procedure returning a truth value. – Fernando Pelliccioni Apr 29 '20 at 19:55
It is never said that the pivot value needs not be a value present in the array (though this is a quite natural choice). But it is essential that the partition does not leave one of the subarrays empty. – Mar 22 '22 at 08:16
The Lomuto algorithm is only "semi stable" in the sense that the elements moved into the lower partition are in original relative order, whereas the elements swapped into the upper partition are scambled. Needless to say, this is of no significance or use, and doesn't bring about a stable sort. A left to right scan of sequence can partition it into two sequences in a stable way; but that is not the Lomuto algorithm. – Kaz May 04 '23 at 20:04

Quicksort Partitioning: Hoare vs. Lomuto

2 Answers2

Pedagogical Dimension

Performance Dimension

Number of Comparisons

Number of Swaps

Lomuto's Method

Hoare's Method

Memory Access Pattern

Equal Elements and Already Sorted Lists

Conclusion

Clarification

Stability

Element Access Pattern

Number of Comparisons

Linked