Why does $O(n \log n)$ seem so linear?

Question

I've implemented an algorithm, that when analyzed should be running with the time complexity of $O(n \log n)$.

However when plotting the computational time against the cardinality of the input set, it seems somewhat linear and computing $R^2$ confirms this somewhat. When then sanity checking myself by plotting $n$ on the $x$-axis and $n \log_2 n$ on the $y$-axis with python, and plotting this it also seemed linear. Computing $R^2$ (scipy.stats.linregress) further confuses me, as I get $R^2=0.9995811978450471$ when my $x$ and $y$ data is created as so:

for n in range(2, 10000000):
    x.append(n)
    y.append(n * math.log2(n))

Am I missing something fundamental? Am I using too few iterations for it to matter? When looking at the graph at http://bigocheatsheet.com/ it does not seem linear at all.

What exactly did you analyse? The worst-case behaviour of the algorithm? Also, note that $O$ denotes upper bounds, so a linear function is within $O(n\log n)$, did you mean that you derived a complexity of $\Theta(n\log n)$? — Discrete lizard, Apr 25 '19 at 06:13
Yes I analyzed the worst-case. And when running it, I fed it both random data, and data that would give me the most computations possible. While it's definitely slower when running on the second set of data, it's still linear. — Andreas V., Apr 25 '19 at 06:22
Remember that $\log n$ "is a small constant" for all $n$s you can run an $O( n \log n) $ algorithm on on your computer. — Pål GD, Apr 25 '19 at 06:26

score 1 · Answer 1 · answered Apr 25 '19 at 06:40

1

Just some general observations.

O(n log n) is only an upper bound. If it's not tight, that's your explanation right there.
A Θ(n log n) running time can have many different components, for instance

$\qquad\displaystyle a \cdot n\log n + b \cdot n \log \log n + c \cdot \sqrt n + d \cdot n + e \cdot \log n + d$

While technically the linearithmic term dominates, if $a$ is small compared to the other coefficients you will have a hard time detecting it.
Measuring wall-clock running time is noisy without end, inparticular because the coefficients mentioned above get skewed by platform details. Try investigating counts, for instance of a dominant operation or block.
Linear regresssion always works. Since the "difference" between $n \log n$ and $n$ is rather small (also considering above point), it's not susprising you'd get a high confidence. Run linearithmic regression and compare!

answered Apr 25 '19 at 06:40

Raphael

72,336
29
179
389

Good point on suggesting to measure counts, or some other property that depends only on your algorithm and not on the implementation. – Discrete lizard Apr 25 '19 at 06:54
@Discretelizard I think it's fair to count implementation specifics; my point was to remove the machine (metal, I/O, OS, competing tasks, ...) from the measurement. – Raphael Apr 25 '19 at 08:07
Well, if you want to compare with the analysis of your algorithm, I'd say you should try to eliminate the rest. Measuring performance of implementations is a reasonable thing to do in general, but not necessarily what would be best here. – Discrete lizard Apr 25 '19 at 08:12
@Discretelizard The OP will have to clarify what they want. Imho, if you want to compare (the performance of) your implementation to (the analysis of) the abstract algorithm -- to check against performance bugs, say, or validate the model you used for analysis -- then counting only those things that appaear identically in both implementation and algorithm is rather meaningless. – Raphael Apr 25 '19 at 08:14
1

Yes, the OP should clarify that more. However, what I propose is not necessarily meaningless. There are many cases in which the theoretical analysis is/cannot be as tight or precise as you would expect the algorithm to behave for 'usual instances' (and where modelling these instances is also out of the question). For example, I recently implemented a distributed protocol for which the best theoretical result was a bound on the number of rounds 'with high probability'. These number of rounds is not an implementation detail. It is not a priori clear how such a bound behaves in practice. – Discrete lizard Apr 25 '19 at 08:28

Why does $O(n \log n)$ seem so linear?

1 Answers1