6

I'm currently creating a program to analyse the pathological cases of Quicksort. Namely, the transition of complexity from $O(n^2)$ to $O(n \log n)$ as a data set gets less ordered. Since Quicksort is a value-based algorithm, the choice of pivot determines its efficiency. We usually (Quicksort 3, and choosing a random element aside) select the first or last element as the pivot in an array. Therefore, since a good pivot is one that splits the array in half symmetrically, a more disordered array will make selecting a good pivot easier; or at least reduce the likely hood of running into the pathological case.

My program gradually randomises this array by swapping some elements each time. When I started to plot the graph of this using gnuplot I notice the trend I expected; a exponential distribution, branching off to a uniform distribution.

My question is, how do I quantify a level of disorder in a array? Is their a better way to gradually make an array of values more disordered?

Raphael
  • 72,336
  • 29
  • 179
  • 389
user103853
  • 247
  • 1
  • 4
  • I would suggest to post this on cstheory.stackexchange.com – somebody Nov 18 '14 at 14:21
  • Note that the notion of "entropy" is heavily used in other contexts. The questions linked in rphv's answer and my comment verbalise more clearly what you want, I think. – Raphael Nov 20 '14 at 21:44
  • Furthermore, I doubt that you can analyse Quicksort by "writing a program". There is lots of literature with formal analyses around. Unfortunately, it's not as simple as you lay out; for instance, cost of chosing a pivot and recursion depth recursion depth behave antagonistically, as do number of comparisons and number of swaps. – Raphael Nov 20 '14 at 21:49

2 Answers2

4

Inversions are one way to measure "disorder" in a list:

Let $A[1..n]$ be an array of $n$ distinct numbers. If $i < j$ and $A[i] < A[j]$ then the pair $(i,j)$ is an inversion of $A$.

However, it's not the only such measure. In general, this concept is formalized in the notion of presortedness - roughly:

An integer function on a permutation $\sigma$ of a totally ordered set that reflects how much $\sigma$ differs from the total order.

The survey papers by Mannila [1] and Estivill-Castro & Wood [2] might be good places to start.

The question How to measure "sortedness" is related.


[1] Heikki Mannila. "Measures of Presortedness and Optimal Sorting Algorithms." IEEE Transactions on Computers 34.4 (1985): 318-325.

[2] Estivill-Castro, Vladmir, and Derick Wood. "A survey of adaptive sorting algorithms." ACM Computing Surveys (CSUR) 24.4 (1992): 441-476.

rphv
  • 1,624
  • 12
  • 25
3

There is an analytical result for something similar (but not identical) to your model of unsortedness:

Banderier et al. [1] precisely analyze the expected number of comparisons needed by Quicksort on a partial permutation of the sorted list of length $n$:

For a parameter $p\in[0,1]$ (that might depend on $n$), we select each element in the initially sorted list i.i.d. with probability $p$. Then, we randomly permute only the selected elements.

For $p=1$, you get random permutations, for $p=0$ we retain the sorted list. Quicksort then needs asymptotically $2\frac np \ln n$ comparisons on average for such a partial permutation (Remark below the proof of Theorem 1 in [1]).

As a consequence, when $p = O\bigl(\frac{\log n}n\bigr)$, Quicksort needs a quadratic number of comparisons, whereas for a constant $p$, Quicksort is linearithmic. There is, however, no sharp transition between those two extremes, if you choose $p$ in the right way (depending on $n$), any behavior between the extremes is possible.


[1] Cyril Banderier, René Beier, Kurt Mehlhorn "Smoothed Analysis for Three Combinatorial Problems", MFCS 2003, LNCS 2747, 2003, pp 198-207: http://link.springer.com/chapter/10.1007/978-3-540-45138-9_14

Sebastian
  • 4,536
  • 2
  • 20
  • 14