12

Considering this pseudo-code of a bubblesort:

FOR i := 0 TO arraylength(list) STEP 1  
    switched := false
    FOR j := 0 TO arraylength(list)-(i+1) STEP 1
        IF list[j] > list[j + 1] THEN
            switch(list,j,j+1)
            switched := true
        ENDIF
    NEXT
    IF switched = false THEN
        break
    ENDIF
NEXT

What would be the basic ideas I would have to keep in mind to evaluate the average time-complexity? I already accomplished calculating the worst and best cases, but I am stuck deliberating how to evaluate the average complexity of the inner loop, to form the equation.

The worst case equation is:

$$ \sum_{i=0}^n \left(\sum_{j=0}^{n -(i+1)}O(1) + O(1)\right) = O(\frac{n^2}{2} + \frac{n}{2}) = O(n^2) $$

in which the inner sigma represents the inner loop, and the outer sigma represents the outer loop. I think that I need to change both sigmas due to the "if-then-break"-clause, which might affect the outer sigma but also due to the if-clause in the inner loop, which will affect the actions done during a loop (4 actions + 1 comparison if true, else just 1 comparison).

For clarification on the term average-time: This sorting algorithm will need different time on different lists (of the same length), as the algorithm might need more or less steps through/within the loops until the list is completely in order. I try to find a mathematical (non statistical way) of evaluating the average of those rounds needed.

For this I expect any order to be of the same possibility.

Franck Dernoncourt
  • 512
  • 1
  • 9
  • 24
Sim
  • 591
  • 1
  • 5
  • 14
  • 6
    you first need to define what average even means. Since the algorithm is deterministic, you'd have to assume some kind of distribution over inputs. – Suresh Mar 06 '12 at 20:59
  • @Sim Can you show how you computed the worst-case time complexity? Then, we might get an idea on what you mean by average complexity in your case. – 0x0 Mar 06 '12 at 21:04
  • I mean average-time in the way of the most-likely time needed (or in other words the 'pure' mathematical version of: the mean of all times observed doing a statistical analysis). For example quicksort does have an average of nlogn even though its worst case is n^2. – Sim Mar 06 '12 at 21:07
  • 1
    @Sim In the case of bubble sort average case = worst case time complexity, meaning, Average case Time complexity is also $n^2$ – 0x0 Mar 06 '12 at 21:20
  • 3
    There's a difference. quicksort is averaged "over the choice of coin tosses when choosing a pivot" which has nothing to do with the data. Whereas you are implying that you want to average "over all inputs" which assumes (for example) that you expect each ordering of the input to occur with the same probability. that's reasonable, but it should be stated explicitly. – Suresh Mar 06 '12 at 21:21
  • @Suresh this depends on the algorithm. If you always take the last element of a (sub)list as a pivot, the only relevant factor is the order of the elements. – Sim Mar 06 '12 at 21:30
  • @Sunil I do know the average case time complexity, but I want to "proof" it, as I did with the worst and best case. – Sim Mar 06 '12 at 21:31
  • @Sim. It's still $n^2$. Perhaps you are conflating insertion sort with bubble sort? – Nicholas Mancuso Mar 06 '12 at 21:37
  • @NicholasMancuso did you read the pseudo-code, it checks if there was a switch within the inner loop, if not the list is considered sorted and therefore the algorithm stops. (The best-case is $O(n)$) – Sim Mar 06 '12 at 21:40
  • @Sim right but then the claim to "average case n log n" reverts back to data-drive assumptions – Suresh Mar 06 '12 at 21:46
  • @Suresh yes it does, I edited it in the original post. Hope I pointed it out enough. – Sim Mar 06 '12 at 21:50
  • @Sim I think you go out of bounds in your array, in the first iteration of the outer loop and the last iteration of the inner loop. – Joe Mar 06 '12 at 22:24
  • @Joe the TO is to be interpreted as x != n therefore breaks as soon as x IS n before executing the body, just as a normal for-clause – Sim Mar 06 '12 at 22:27
  • @Sim when i = 0, and x = n-1, you access list[x+1] = list[n] – Joe Mar 06 '12 at 22:40
  • @Joe yes I do now see it, but i guess that this algorithm will still work as an example. – Sim Mar 06 '12 at 22:47
  • This is rather complex to analyze, see Knuth's "The Art of Computer Programming" volume 3 for the gory details. And even "optimized bubblesort" is a horribly bad sorting algorithm, don't ever let me catch you using it in anger. – vonbrand Feb 01 '13 at 13:28

4 Answers4

20

Recall that a pair $(A[i], A[j])$ (resp. $(i,j)$) is inverted if $i < j$ and $A[i] > A[j]$.

Assuming your algorithm performs one swap for each inversion, the running time of your algorithm will depend on the number of inversions.

Calculating the expected number of inversions in a uniform random permutation is easy:

Let $P$ be a permutation, and let $R(P)$ be the reverse of $P$. For example, if $P = 2,1,3,4$ then $R(P) = 4,3,1,2$.

For each pair of indices $(i,j)$ there is an inversion in exactly one of either $P$ or $R(P)$.

Since the total number of pairs is $n(n-1)/2$, and the total number and each pair is inverted in exactly half of the permutations, assuming all permutations are equally likely, the expected number of inversions is:

$$\frac{n(n-1)}{4}$$

Raphael
  • 72,336
  • 29
  • 179
  • 389
Joe
  • 4,107
  • 1
  • 20
  • 38
  • this evaluates the amount of inversions. but how about the amount of comparisons which depends on the time the break-clause is stepped in – Sim Mar 06 '12 at 22:20
  • You get one comparison by swap and most importantly one swap can reduce the number of inversions by at most one. – jmad Mar 06 '12 at 22:24
  • not every comparison results in a swap, if the if-clause is false, no inversion is done. – Sim Mar 06 '12 at 22:31
  • @rgrig If you provide a counter-example, then I will correct my answer. – Joe Mar 07 '12 at 00:18
  • @Joe: I removed my comment. It was wrong. – rgrig Mar 07 '12 at 07:06
  • @Sim indeed. So I don't know. (I guess Joe's solution is elegant but not that close to the implementation) – jmad Mar 08 '12 at 01:07
10

For lists of length $n$, average usually means that you have to start with a uniform distribution on all $n!$ permutations of [$1$, .., $n$]: that will be all the lists you have to consider.

Your average complexity would then be the sum of the number of step for all lists divided by $n!$.

For a given list $(x_i)_i$, the number of steps of your algorithm is $nd$ where $d$ is the greatest distance between a element $x_i$ and his rightful location $i$ (but only if it has to move to the left), that is $\max_i(\max(1,i-x_i))$.

Then you do the math: for each $d$ find the number $c_d$ of lists with this particular maximal distance, then the expected value of $d$ is:

$$\frac1{n!}\ \sum_{d=0}^n{\ dc_d}$$

And that's the basic thoughts without the hardest part which is finding $c_d$. Maybe there is a simpler solution though.

EDIT: added `expected'

jmad
  • 9,488
  • 1
  • 39
  • 42
  • If you consider a normal distribution, is there a way to approximate $c_d$ ? – Sim Mar 06 '12 at 21:48
  • You can say $c_d≥(n+1-d)(d-1)!$ because you can mingle anywhere all the permutations of [$2$, .., $d$] and append $1$ at the end but that's to small to prove $n²$ in average. – jmad Mar 06 '12 at 22:15
2

Number of swaps < Number of iterations (in both optimized as well as simple bubble case scenario)

Number of Inversions = Number of swaps.

Therefore, Number of Iterations > $\frac{n(n-1)}{4}$

Thus, Average case complexity is $\omega(n^2)$. But, since average case cant exceed worst case, we get that it is $O(n^2)$,

This gives us : Average Time = $\theta(n^2)$

(Time complexity = Number of iteration no. of iterations > no. of swaps)

kushj
  • 121
  • 3
0

in this document, the average time complexity of bubble sort reached O(nlog(n))! http://algo.inria.fr/flajolet/Publications/ViFl90.pdf

  • 1
    That's not true. They prove a result of Knuth showing that the expected number of comparisons is roughly $n^2/2$. – Yuval Filmus Feb 19 '18 at 17:01