20

Let us say for instance I am doing string processing that requires some analysis of two strings. I have no given information about what their lengths might end up being, so they come from two distinct families. Would it be acceptable to call the complexity of an algorithm $O(n * m)$ or $O(n + m)$ (depending on if we use a naive or an optimized algorithm)?

On a similar vein, let us presume the algorithm we choose actually requires two stages - a setup phase on the first string which allows us to process any number of other strings without incurring that initial cost. Would it be considered appropriate to say it has a $O(n)$ construction followed by any number of $O(m)$ calculations?

Would it be appropriate to just call them $O(n)$ because both calculations are linear?

FrankW
  • 6,589
  • 4
  • 26
  • 42
corsiKa
  • 423
  • 1
  • 3
  • 10
  • See the comments on this answer for a little background - my respect to @corsiKa for so bravely asking such a contentious question. – OldCurmudgeon Sep 15 '14 at 23:25
  • @OldCurmudgeon, I see. I would hate to wade into that comment thread. Oldcurmudgeon, are you arguing over big-O notation without understanding big-O notation? Awkward indeed. Also, you and corsiKa are arguing over running time without defining the parameters $n$ and $m$ -- a recipe for miscommunication. Hint: one common convention when dealing with strings is to agree to use $m$ to use the length of one string and $n$ for the length of another string -- but ideally it's probably best to make this explicit, because otherwise it can cause confusion (as illustrated here). – D.W. Sep 15 '14 at 23:40
  • @D.W. It's possible that OldCurmudgeon simply learned a different definition in school... as I point out in a comment below, it's possible to eschew multiple variables, though I've never really thought about doing it until now. Maybe this - or something like it - used to be standard? – Patrick87 Sep 16 '14 at 06:34
  • 2
    I think this has sufficient answers here and here. – Raphael Sep 16 '14 at 08:22
  • related: https://cs.stackexchange.com/questions/105280/order-mistake-definition-in-clrs and https://math.stackexchange.com/questions/353461/big-mathcalo-notation-for-multiple-parameters – Neal Young Feb 23 '22 at 17:30

1 Answers1

20

Yes, of course. This is fine and perfectly acceptable. It is common and standard to see algorithms whose running time depends upon two parameters.

For instance, you will often see the running time of depth-first search expressed as $O(n+m)$, where $n$ is the number of vertices and $m$ is the number of edges in the graph. This is perfectly valid. The meaning of this is there exists a constant $c$ and numbers $n_0,m_0$ such that the running time of the algorithm is at most $c \cdot (n+m)$, for all $n>n_0,m>m_0$. In other words, if the exact running time is $f(n,m)$, we say that $f(n,m) = O(n+m)$ if there exists $c,n_0,m_0$ such that $n>n_0$ and $m>m_0$ implies $f(n,m) \le c \cdot (n+m)$.

Yes, it is perfectly appropriate and acceptable to say that the first stage takes $O(n)$ time and the second stage takes $O(m)$ time.

Important: make sure you define what $n$ and $m$ are. You can't say "this is an $O(n)$ time algorithm" without specifying what $n$ is. If $n$ isn't specified in the problem statement, you need to specify it. For instance, see graph algorithms, where we typically define $n = $ # of vertices and $m = $ # of edges.

As far as whether you can call them $O(n)$ time, no, of course not -- unless you somehow know that $m = O(n)$. Of course, if you know that $m = O(n)$, then it follows that $m+n = O(n)$, so an $O(m+n)$ time algorithm is also an $O(n)$ time algorithm. But if there is no guarantee that $m = O(n)$, then you cannot call it an $O(n)$ time algorithm.

This is basic stuff. You'll find it all over algorithms textbooks.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • With the greatest respect could you please cite some source for this? I realise that my CS degree happened 35+ years ago but I am certain at that time there was no other linear complexity than O(n). I will be happy to change my beliefs and I have deep respect for a 14.8k rep but I hope you can forgive an Old Curmudgeon. – OldCurmudgeon Sep 15 '14 at 23:38
  • 1
    @OldCurmudgeon, odds are that you'll find examples of this in many standard algorithms textbooks. What ones have you looked at? Have you tried looking at the chapter on depth-first search (the example that I mentioned in my answer)? – D.W. Sep 15 '14 at 23:41
  • I am looking primarily at Time Complexity where Linear Time is described as O(n). There is no discussion about setup time or O(n + m + o + p ...) complexity - there is just n. – OldCurmudgeon Sep 15 '14 at 23:47
  • To me - complexity metrics describe the complexity of the algorithm - if the algorithm takes 1,000,000 iterations to set up and then 1,000,000,000,000 or 1,000 iterations to function it doesn't matter, the algorithm is still O(n) - I am yet to be convinced that there is any linear-complexity algorithm that is not O(n). – OldCurmudgeon Sep 15 '14 at 23:53
  • 2
    @OldCurmudgeon In my edition of CLRS exercise 3.1-8 presents exactly this definition of the $O$-notation for functions of many variables. And its upper bound on the running time of dfs is $O(V+E)$ for a graph $(V,E)$. – Kirill Sep 16 '14 at 06:07
  • @Kirill I could see, maybe, at some point in the past, the standard usage being to take $n$ equal to the sum total size of all input taken together. Note that DFS apparently remains linear (with respect to $n$) even under this definition, I believe, for sane implementations of the algorithm. – Patrick87 Sep 16 '14 at 06:29
  • @Patrick87 An algorithm's running time can be bounded above by functions of different characteristics of the input; the length of the input is only one such characteristic, it doesn't even have to be called $n$. For example, one could say that the time complexity of linear search is $O(\min(m,n))$ where $m$ is the location of the first matching element, and $n$ the total length. – Kirill Sep 16 '14 at 06:42
  • 2
    @Kirill My point was that it's conceivable, at some point in the past, it was considered customary to only consider the total aggregate length, to the extent that doing otherwise might have been considered an error. If you we're grading a student's exam and that student used total input length $n$ as the variable for time complexity of DFS, would you consider it an error not to consider two dimensions (V and E)? What's true, and what people are willing to concede, are not always one and the same. – Patrick87 Sep 16 '14 at 07:09
  • 1
    I agree insofar as everybody uses Landau notation this way, but almost nobody knows what it actually means (unless you connect the parameters functionally). See also the article linked in A. Schulz's answer here which starts off by stating that the "basic" and "common" use is wrong. – Raphael Sep 16 '14 at 08:18
  • @Raphael One example that paper gives is the function $g(0,n) = 2^n$, $g(m>0,n)=mn$ that is technically $O(mn)$ under the above definition, but doesn't behave well at all. (Just to demystify why the definition is somewhat wrong.) – Kirill Sep 16 '14 at 17:59
  • 1
    @Patrick87 Complexity theory uses, by virtue of the definition of many well-known classes, mostly input length (with notable exceptions). Algorithm analysis is -- when done seriously -- interested in learning something about actual resource usage (as far as the model allows) so other parameters become more interesting as to paint the whole picture (more accurately). – Raphael Sep 17 '14 at 09:04
  • 1
    @OldCurmudgeon It depends on what $n$ refers to in your case. If $n$ is the size of the input, then definitely $O(n)$ is what is called linear time. However, if $n$ is the number of vertices and $m$ is the number of edges in the graph, then to even read the input, you need $O(n + m)$ time. Notice that there are cases where $O(n)$ time doesn't suffice and cases where $O(m)$ time doesn't suffice (exercise). Maybe it's just a confusion of what the $n$ in $O(n)$ refers to. – Pål GD Sep 22 '14 at 20:32