Complexity of finding the largest $m$ numbers in an array of size $n$

Question

What follows is my algorithm for doing this in what I believe to be $O(n)$ time, and my proof for that. My professor disagrees that it runs in $O(n)$ and instead thinks that it runs in $\Omega(n^2)$ time. Any comments regarding the proof itself, or the style (i.e. my ideas may be clear but the presentation not).

The original question:

Given $n$ numbers, find the largest $m \leq n$ among them in time $o(n \log n)$. You may not assume anything else about $m$.

My answer:

Sort the first $m$ elements of the array. This takes $O(1)$ time, as this is totally dependent on $m$, not $n$.
Store them in a linked list (maintaining the sorted order). This also takes $O(1)$ time, for the same reason as above.
For every other element in the array, test if it is greater than the least element of the linked list. This takes $O(n)$ time as $n$ comparisons must be done.
If the number is in fact greater, then delete the first element of the linked list (the lowest one) and insert the new number in the location that would keep the list in sorted order. This takes $O(1)$ time because it is bounded by a constant ($m$) above as the list does not grow.
Therefore, the total complexity for the algorithm is $O(n)$.

I am aware that using a red-black tree as opposed to linked list is more efficient in constant terms (as the constant upper bound is $O(m\cdot \log_2(m))$ as opposed to $m$ and the problem of keeping a pointer to the lowest element of the tree (to facilitate the comparisons) is eminently doable, it just didn't occur to me at the time.

What is my proof missing? Is there a more standard way of presenting it (even if it is incorrect)?

"better than $O(n\log n)$" -- that is rough. I assume you mean $\Theta(n \log n)$? Another problem of the question is that it is not clear wether $m$ is fixed. — Raphael, Apr 24 '12 at 18:38
@Raphael, I am sending you what I have. No idea what he meant. Assume that it really is $O(n⋅log2(n))$ With regard to whether or not $m$ is fixed, I have no idea. When I calculated the complexity, I assumed that it was, but there was no basis for that assumption. — soandos, Apr 24 '12 at 19:03
There are no $n$'s or $m$'s in the original problem. Based on context, I'm assuming you want to find the largest $m$ numbers in an array of $n$ numbers. EDIT: nevermind, it's in the question title. — Joe, Apr 24 '12 at 19:56
@soandos: Then you ask your professor; neither you nor us can be expected to work with half a question. By the way, the actual wording you quote has another problem: Every algorithm is $O(1)$ if you fix your input size to 10 billion. — Raphael, Apr 24 '12 at 20:22
@Raphael exactly. if all you're given is the quote in the original question, you have to clarify the question just to determine what $n$ is! — Joe, Apr 24 '12 at 20:31
@soandos: I incorporated your update into the "quote". The answers by Louis and Joe address your problem sufficiently (imho). And unforgiven solves the exercise, of course. — Raphael, Apr 24 '12 at 20:41
@soandos If what you wanted was a solution to the exercise (thanks unforgiven), then you should have just asked for it from the start. Instead, you asked something like: "is my algorithm $O(n)$ and how can I improve my proof?" — Joe, Apr 25 '12 at 21:25

score 17 · Accepted Answer · edited Apr 24 '12 at 20:27

17

Here is an $O(n)$ algorithm solving the problem.

Use the worst-case $O(n)$ selection algorithm to determine the $n-m+1$-th order statistics. Let $k$ be this number, which is the smallest of the $m$ largest numbers we are trying to determine.
Now partition the array around the pivot $k$ using the QuickSort partition function. This step takes $O(n)$ too.
Output the $m$ largest numbers: these are given by $k$ and all of the numbers in the upper subarray generated by partition in step 2.

edited Apr 24 '12 at 20:27

Raphael

72,336
29
179
389

answered Apr 24 '12 at 20:18

Massimo Cafaro

4,247
18
27

1

Nice. Depending on which algorithm you choose in 1., 2. might be done already. – Raphael Apr 24 '12 at 20:27

score 5 · Answer 2 · edited May 24 '13 at 03:35

5

Your algorithm takes $\Theta(n + mn)$ time. I suspect that your professor is looking for something that takes $O(n+ n\log m)$ time, which should be possible, maybe by using a heap...

The source of your disagreement with the professor is that he or she doesn't appear to think $m$ is a constant, despite how the question is worded. If it's not, then $\Theta(m)$ is a lot worse than $\Theta(\log m)$.

edited May 24 '13 at 03:35

Juho

22,554
7
62
115

answered Apr 24 '12 at 18:29

Louis

2,926
16
25

The red-black tree does not take that time? – soandos Apr 24 '12 at 18:30
4

This answer does not tell the OP where he goes wrong in his reasoning, can you clarify that? – Juho Apr 24 '12 at 18:35
See update to question – soandos Apr 24 '12 at 20:37

score 1 · Answer 3 · edited Apr 13 '17 at 12:48

Correctness
Missing from your presentation is the loop invariant to establish correctness: you maintain the largest $m$ elements encountered so far in a linked list. Thus, by the end of your algorithm, you have tested all the elements, and so you have the largest $m$ elements in the array. Your algorithm is still correct, but stating the purpose of the linked list at the beginning of the description makes its correctness more explicit.

Running Time
$log_{10}( 10 \mbox{ billion}) = 10$, (with base 2, it's about ~33) which is a heck of a lot smaller than 10 million. In the example given, $m$ is many times larger than $\log n$, and so I don't think you can safely assume that $m$ is constant.

You should provide an algorithm that is $o(n \log n)$ as long as $m = o(n)$. Replacing your linked-list and linear search with a balanced binary search tree or a min-heap will achieve this run-time: $O(m \log m + n \log m) = O(n \log m) = o(n \log n)$. (assuming $m = o(n)$, otherwise your run-time is $\Theta(n \log n)$)

In case you're not familiar with the notation, the intuition behind $o(n \log n)$ is $< O(n \log n)$, but see the corresponding question on cs.SE for details of the notation.

Seems your answer is closest answer to what asked, but I think you can simply say with $m=n/2$ OP's algorithm causes to $\Theta (n^2)$. — , Nov 08 '12 at 16:42

Complexity of finding the largest $m$ numbers in an array of size $n$

3 Answers3