Is there a linear-time algorithm to determine if an array has duplicate entries that uses only constant extra space?

Question

Determining whether or not an array has duplicate entries has two straightforward solutions:

Build a hashset of entries, then search for elements in this hashset. This takes $\mathcal O(n)$ time and $\mathcal O(n)$ extra space.
Sort the array, then search for consecutive elements that are the same. This takes $\mathcal O(n \log n)$ time and $\mathcal O(1)$ extra space.

Is there an algorithm that can solve this problem with the best of both approaches, using only $\mathcal O(n)$ time and $\mathcal O(1)$ extra space?

Question Finding duplicate in immutable array in linear time and constant space is similar, but the solution to that question only works when the values come from the set $\{1, ..., n\}$; my values are large integers. Also unlike that question, I allow the input array to be modified and used as a workspace.

One strategy might be to attempt to turn the array into a kind of hashtable. However, this seems like it won't work because of the lack of empty space (which makes both moving objects into place hard, as well as getting $\mathcal O(1)$ queries.).

However, I suspect that this cannot actually be done, but I'm not sure how to go about proving it.

The hashset method takes $\Theta(\log n)$ time. Hash table operations have $\log n$ cost, even if the constant is low in practice and they're often approximated as $O(1)$. — Gilles 'SO- stop being evil', Jul 25 '19 at 06:24
And if you need fast practical approach, look at https://cs.stackexchange.com/questions/93563/fast-stable-almost-in-place-radix-and-merge-sorts — Bulat, Jul 25 '19 at 06:38
@Bulat A radix sort is still a sort, and takes $\log n$ time in the worst case. A radix sort can take $O(1)$ time if the elements are small integers (with a bound that doesn't depend on the array size), but the question explicitly rules this out (“my values are large integers”). — Gilles 'SO- stop being evil', Jul 25 '19 at 06:43
This "log n" is actually a log of value range. It's up to you how to count it, but at least I find your measurement non-standard and deletion of my answer unreasonable. If you don't agree with Wikipedia, use comments rather than moderation tools to promote your opinion — Bulat, Jul 25 '19 at 06:50
@Bulat It's not an opinion, it's a fact. Radix sort is only O(1) on a bounded range, and the question explicitly states that the range is not bounded. But regardless of the facts, your post did not answer the question, this is why I deleted it. If the question states “X does not solve the problem because …” and you want to answer “X actually does solve the problem because …”, you cannot leave out the “because …” part, it's absolutely essential. — Gilles 'SO- stop being evil', Jul 25 '19 at 11:15

score 2 · Answer 1 · answered Jul 25 '19 at 07:48

2

Suppose that your elements come from a domain of size $n^2$. Any algorithm using time $T$ and space $S$ corresponds to a branching program having depth $T$ and containing at most $T \cdot 2^S$ nodes. Theorem 6.13 of Time-space tradeoff lower bounds for randomized computation of decision problems shows that $$ T = \Omega\left(n \sqrt{\log \tfrac{n}{S + \log T}/\log\log \tfrac{n}{S + \log T}}\right). $$ In particular, if $S$ is $O(\log n)$ (which corresponds to your $O(1)$ extra space, assuming you cannot modify the input) then $T = \Omega(n\sqrt{\log n/\log\log n})$.

answered Jul 25 '19 at 07:48

Yuval Filmus

276,994
27
311
503

For elements from domain of P(n) size, hashing will take O(log(n)) time which contradicts to what he was said. No, it's time to delete your answer ;) – Bulat Jul 25 '19 at 08:04
Hashing takes linear time. I don't follow. – Yuval Filmus Jul 25 '19 at 08:18
If domain size is n^2, then element contains 2*log(n) bits, so its hashing requires O(log(n)) time. Where I'm wrong? – Bulat Jul 25 '19 at 08:24
In the branching program model, querying an element is an atomic operation. It's an $n^2$-way branching program. – Yuval Filmus Jul 25 '19 at 08:25
I.e. the program size is depends on the input size? In order to generate such program, you need O(n^2) time. If we can use O(n^3) branching as a single operation, we can choose proper branch based on the entire array contents, so "solving" problem in O(1) time. – Bulat Jul 25 '19 at 08:32
Actually I find it's very strange idea to manipulate well-known measures of O(1) for element size, O(n) for hashing and radix sorting. Idea of reading log(n) bits in a single operation by declaring it as n-way branching is priceless! – Bulat Jul 25 '19 at 08:36
You have a different program for every $n$. It's a non-uniform model. – Yuval Filmus Jul 25 '19 at 08:37
I think it's going too far. Removing answer that relied on the standard fixed-size element assumption, but keeping answers with arbitrary non-standard assumptions made the entire topic meaningless. With only O(1) extra space, you can't even store one temporary data element of your size! – Bulat Jul 25 '19 at 08:41
If the elements have fixed size, then the problem becomes very easy. In other words, keeping the elements fixed-size trivializes the problem and makes the entire topic meaningless. – Yuval Filmus Jul 25 '19 at 08:42
My best assumption is that he doesn't know about in-place MSD sort. – Bulat Jul 25 '19 at 08:47

Is there a linear-time algorithm to determine if an array has duplicate entries that uses only constant extra space?

1 Answers1

Linked