Which measure of sortedness explains the phase transition in Quicksort's runtime?

Question

I'm currently creating a program to analyse the pathological cases of Quicksort. Namely, the transition of complexity from $O(n^2)$ to $O(n \log n)$ as a data set gets less ordered. Since Quicksort is a value-based algorithm, the choice of pivot determines its efficiency. We usually (Quicksort 3, and choosing a random element aside) select the first or last element as the pivot in an array. Therefore, since a good pivot is one that splits the array in half symmetrically, a more disordered array will make selecting a good pivot easier; or at least reduce the likely hood of running into the pathological case.

My program gradually randomises this array by swapping some elements each time. When I started to plot the graph of this using gnuplot I notice the trend I expected; a exponential distribution, branching off to a uniform distribution.

My question is, how do I quantify a level of disorder in a array? Is their a better way to gradually make an array of values more disordered?

Note that the notion of "entropy" is heavily used in other contexts. The questions linked in rphv's answer and my comment verbalise more clearly what you want, I think. — Raphael, Nov 20 '14 at 21:44
Furthermore, I doubt that you can analyse Quicksort by "writing a program". There is lots of literature with formal analyses around. Unfortunately, it's not as simple as you lay out; for instance, cost of chosing a pivot and recursion depth recursion depth behave antagonistically, as do number of comparisons and number of swaps. — Raphael, Nov 20 '14 at 21:49

score 4 · Accepted Answer · edited Apr 13 '17 at 12:48

Inversions are one way to measure "disorder" in a list:

Let $A[1..n]$ be an array of $n$ distinct numbers. If $i < j$ and $A[i] < A[j]$ then the pair $(i,j)$ is an inversion of $A$.

However, it's not the only such measure. In general, this concept is formalized in the notion of presortedness - roughly:

An integer function on a permutation $\sigma$ of a totally ordered set that reflects how much $\sigma$ differs from the total order.

The survey papers by Mannila [1] and Estivill-Castro & Wood [2] might be good places to start.

The question How to measure "sortedness" is related.

[1] Heikki Mannila. "Measures of Presortedness and Optimal Sorting Algorithms." IEEE Transactions on Computers 34.4 (1985): 318-325.

[2] Estivill-Castro, Vladmir, and Derick Wood. "A survey of adaptive sorting algorithms." ACM Computing Surveys (CSUR) 24.4 (1992): 441-476.

score 3 · Answer 2 · answered Nov 21 '14 at 10:49

There is an analytical result for something similar (but not identical) to your model of unsortedness:

Banderier et al. [1] precisely analyze the expected number of comparisons needed by Quicksort on a partial permutation of the sorted list of length $n$:

For a parameter $p\in[0,1]$ (that might depend on $n$), we select each element in the initially sorted list i.i.d. with probability $p$. Then, we randomly permute only the selected elements.

For $p=1$, you get random permutations, for $p=0$ we retain the sorted list. Quicksort then needs asymptotically $2\frac np \ln n$ comparisons on average for such a partial permutation (Remark below the proof of Theorem 1 in [1]).

As a consequence, when $p = O\bigl(\frac{\log n}n\bigr)$, Quicksort needs a quadratic number of comparisons, whereas for a constant $p$, Quicksort is linearithmic. There is, however, no sharp transition between those two extremes, if you choose $p$ in the right way (depending on $n$), any behavior between the extremes is possible.

[1] Cyril Banderier, René Beier, Kurt Mehlhorn "Smoothed Analysis for Three Combinatorial Problems", MFCS 2003, LNCS 2747, 2003, pp 198-207: http://link.springer.com/chapter/10.1007/978-3-540-45138-9_14

Which measure of sortedness explains the phase transition in Quicksort's runtime?

2 Answers2

Linked