Determining the particular number in $O(n)$ time and space (worst case)

Question

$\newcommand\ldotd{\mathinner{..}}$Given that $A[1\ldotd n]$ are integers such that $0\le A[k]\le m$ for all $1\le k\le n$, and the occurrence of each number except a particular number in $A[1\ldotd n]$ is an odd number. Try to find the number whose occurrence is an even number.

There is an $\Theta(n\log n)$ algorithm: we sort $A[1\ldotd n]$ into $B[1\ldotd n]$, and break $B[1\ldotd n]$ into many pieces, whose elements' value are the same, therefore we can count the occurrence of each element.

I want to find a worst-case-$O(n)$-time-and-$O(n)$-space algorithm.

Supposing that $m=\Omega(n^{1+\epsilon})$ and $\epsilon>0$, therefore radix sort is not acceptable. $\DeclareMathOperator{\xor}{xor}$ Binary bitwise operations are acceptable, for example, $A[1]\xor A[2]$.

Aryabhata's answer below shows that the general case is not good, but perhaps you have further restrictions available? A simple (but big) restriction would be to enforce that all the entries in the array are $O(n)$ in size. This would give a pretty trivial linear algorithm. — Luke Mathieson, Jul 23 '12 at 04:02
@LukeMathieson: I deleted that answer, as I am not yet convinced that the paper I cited will work without any modification, and besides, OP seems to be interested only in the uniform cost RAM model. — Aryabhata, Jul 23 '12 at 04:28
@Aryabhata: hehe, well the answer that's not there then! Out of interesting, and perhaps useful for Frank, what did you think was the problem with adapting the result in the paper? A quick skim suggested it applied, but I obviously didn't read into it. — Luke Mathieson, Jul 23 '12 at 04:51
@LukeMathieson: The fact that the other elements need to appear an odd number of times in the current problem. Since, I skimmed over the proof too... — Aryabhata, Jul 23 '12 at 05:36
It would be interesting if you are interested in theoretic results or in practical solutions. From the theory point of view, my first quick response is, that you can sort a list of integers faster than $O(n\log n)$. There is a deterministic algorithm by Han that runs in $O(\log\log n)$ time. For randomized algorithms, even better results are known, e.g. Han and Thorup have found a $O(n \sqrt{\log\log n})$ expected time algorithm. However, I think that your problem shouldn't require sorting. — A.Schulz, Jul 23 '12 at 07:01
What's funny is, that if you look at the opposite setting (find the only odd occurrence), then you get the result by XORing all entries of $A$. — A.Schulz, Jul 23 '12 at 09:47
@A.Schulz For that (opposite setting), I mentioned $\xor$ in my text. — Yai0Phah, Jul 23 '12 at 11:24
Are the numbers themselves odd, or just the number of occurances? — Joe, Jul 23 '12 at 17:51
@Aryabhata: thank you very much for that citation. I don't think it can help at all in this case, because we are not looking for a majority element (the element in question could appear 2 times only, for instance), but it is interesting because the algorithm described in that paper is an (uncredited) ancestor of a generalization by Schenker et al. which is very well known. Thank you! — Jérémie, Jul 24 '12 at 13:11
@Jérémie: Glad to have helped! btw, that paper deals with an element repeating more than $n/k$ times, including the case $k=n$. So it might be relevant, but might need some modification. — Aryabhata, Jul 24 '12 at 17:18
@FrankScience: Are your numbers necessarily integers? Or could they be rationals? Or reals? Or elements of an abstract totally ordered set that supports $O(1)$-time comparisons? (If you do mean "integers", please change the question to read "integers" instead of "numbers".) — JeffE, Jul 25 '12 at 01:03
@JeffE Well, integers, but any data structure whose length is invariable in computer could be considered as some integers. — Yai0Phah, Jul 25 '12 at 03:16
@FrankScience: Sure, but if $A[i]$ is a rational number (for example), the inequality $0\le A[i] \le m$ doesn't tell you much about the number of bits in $A[i]$. — JeffE, Jul 25 '12 at 23:06

Raphael · Accepted Answer · 2012-07-23T15:42:56.153

Here is an idea for a simple algorithm; just count all occurrences!

Find $m = \max A$. -- time $\Theta(n)$
"Allocate" array $C[0..m]$. -- time $O(1)$¹
Iterate over $A$ and increase $C[x]$ by one whenever you find $A[\_]=x$. If $C[x]$ was $0$, add $x$ to a linear list $L$. -- time $\Theta(n)$
Iterate over $L$ and find the element $x_e$ with $C[x_e]$ even. -- time $O(n)$.
Return $x_e$.

All in all, this gives you a linear-time algorithm which may use (in the sense of allocating) lots of memory. Note that being able to random-access $C$ in constant time independently of $m$ is crucial here.

An additional $O(n)$ bound on space is more difficult with this approach; I don't know of any dictionary data-structure that offers $O(1)$ time lookup. You can use hash-tables for which here are implementations with $O(1 + k/n)$ expected lookup time ($n$ the table's size, $k$ the number of stored elements) so you can get arbitrarily good with linear space -- in expectation. If all values in $A$ map to the same hash value, you are screwed.

On a RAM, this is implicitly done; all we need is the start position and maybe the end position.

score 0 · Answer 2 · answered Jul 23 '12 at 09:58

0

An almost trivial solution - which uses however $\Theta(n)$ space - is to use a hash map. Recall that a hash map has amortized runtime $\mathcal{O}(1)$ for adding and looking up elements.

Hence, we can use the following algorithm:

Allocate a hash map $H$. Iterate over $A$. For each element $i \in A$, increase the number of occurences seen, i.e. $H(i)++$.
Iterate through the key set of the hash map, and check which of the keys has an even count of occurences.

Now this is a simple algorithm which doesn't really use any large trick, but sometimes even this suffices. If not, you might want to specifiy what space restrictions you impose.

answered Jul 23 '12 at 09:58

HdM

868
6
12

I would still like to know, if there is a non-randomized $O(n)$ time algorithm using polynomial space. In particular, is there any theoretical evidence that finding the only even-occurring item is harder than finding the only odd-occurring item? – A.Schulz Jul 23 '12 at 11:01
@A.Schulz I think that it is the $O(n)$-expected-time algorithm by using hash table. I remember that somebody told me an $O(n)$-algorithm (or for some special case, say, odd=1 and even=2) maybe with stack, but I cannot recall it. – Yai0Phah Jul 23 '12 at 12:11
Not every hashtable implementation has this property; usually, lookup is not $O(1)$, not even amortized (afaik). In fact, a prior discussion has not yielded any implementation that has constant time lookup. Can you be more specific? – Raphael Jul 23 '12 at 15:21

Determining the particular number in $O(n)$ time and space (worst case)

2 Answers2