2

Summary

I saw a solution to a problem described as having O(c) Time complexity where c is the number of unique items in n. I don't understand why we can say the complexity is O(c) despite looping through all items.

Example

func foo(items: [Int]) {
   var uniqueItems: Set<Int> = // ...

for i in items { if uniqueItems.contains(i) { continue // does this reduce time complexity? }

   // O(1) Time operations...
}

}

3366784
  • 131

3 Answers3

6

Big-O notations always states the behaviour of an algorithm in the limit, i.e. if n increases without bound.

Obviously, in the real word, checking whether i occurs costs a certain amount X of computation and actually doing the domain action takes another amount Y. You'd expect that X is a lot smaller than Y, so that for reasonable problem sizes, the value of c is much more important than the value of n. Therefore it is tempting to say that the time is proportional to c.

But in the limit, the value of n will always be more important. Imagine having a collection of a billion items, of which only 3 are distinct. Eventually, all those failed membership checks will take up more time than the actual computation, and from then on the value of n will be more important than the value of c. Assuming that both the membership check and the payload computation are O(1), the overall complexity is indeed O(n) and not O(c).

Kilian Foth
  • 109,273
  • When the check is trivial in comparison to the operation it’s tempting to claim O(c). But you must remember that in big O c is trivially smaller than n. Assume c is 2 an n is 2 trillion and you’ll start to see why O(c) meaningless. Whatever the cost ratio between the work and the check the ratio between n and c can be larger. – candied_orange Aug 23 '23 at 22:30
3

I can't write an answer better than Kilian Foth's, but I do want to point out here that Big O notation is not about infinitesimally precise measurements; it's about a very generalized definition of an O function, whose meaning is essentially "if input size increases by N, we expect f(input) effort to increase by O(N)".

The ultimate goal here is to get an intuitive feel for the linearity of an algorithm (or lack thereof) relative to its input size. It's perfectly okay to consider some non-zero things to be negligible if it doesn't meaningfully contribute to our understanding of its growth.

In this case, it is assumed that checking for uniqueness is a negligible price to pay which will help you cut out any redundant work, and the latter is being considered as the only non-negligible work here.
You are correct that when N significantly eclipses C (i.e. you have a lot of redundant values to sift through), this causes the uniqueness check to become non-negligible in the grand scheme of things.

However, you still need to consider the work itself (i.e. what you do when the item is unique). The more effort this work requires, the more "allowance" you have to consider your uniqueness check as a negligible price to pay.

So whether it is correct to consider O(c) hinges on C being relatively close to N, relative to how much redundant work it saves you from doing. If that assumption is reasonable, so is the conclusion of focusing on O(c).

Flater
  • 49,580
  • "you still need to consider the work itself (i.e. what you do when the item is unique). The more effort this work requires, the more "allowance" you have to consider your uniqueness check as a negligible price to pay."

    This is probably the piece that I was missing.

    – 3366784 Aug 23 '23 at 23:23
  • 2
    "to increase by no more than O(N)" remember, Big O is the worst case limit. It is not an estimate. It in no way considers possible optimizations or average performance. – candied_orange Aug 24 '23 at 01:37
  • @candied_orange: I slightly disagree. The current topic is exactly one of considered the added work of the uniqueness check to be negligible, which technically underestimated the big O. So to be clear, do you consider the proposed solution of O(c) to therefore be incorrect because it's no longer a worst case scenario? – Flater Aug 24 '23 at 01:59
  • 1
    @Flater yes. Because that's not big O. That's small o. You can make the topic whatever you like but don't call big O what it's not. – candied_orange Aug 24 '23 at 02:13
  • @candied_orange: Fair enough. To some degree, I'm going to mention that the catch-all name for time complexity evaluations has eponymously become "big O notation" when it apparently shouldn't have because of the nuanced differences that you pointed out. – Flater Aug 24 '23 at 02:53
  • Much more interesting is the generalisation where you do O(k) work for each unique item, because then you need context to know if the dominating term is O(n) or O(ck), which is what I think this answer is getting at. – Caleth Aug 24 '23 at 09:40
  • @Caleth Yes. Based on OP's reference material talking about O(c), it is inferred that k (the task post validation) can be assumed to approximate the whole task. If that inference is not correct, you're more correct than the OP's source material. – Flater Aug 24 '23 at 09:43
  • Very much so. Big O is always a worst case scenario, which is, of course, an edge case, but the exact WORST POSSIBLE edge case that we must account for in our calculations. – Satanicpuppy Aug 24 '23 at 20:34
-1

Checking whether an item is in a set is not free. Your algorithm does that n times, so it is O(n).

Sets may have an implementation that allows to iterate through the elements of Ghent set in time proportional to the number of elements, and that might be faster.

gnasher729
  • 44,814
  • 4
  • 64
  • 126