10

$\newcommand\ldotd{\mathinner{..}}$Given that $A[1\ldotd n]$ are integers such that $0\le A[k]\le m$ for all $1\le k\le n$, and the occurrence of each number except a particular number in $A[1\ldotd n]$ is an odd number. Try to find the number whose occurrence is an even number.

There is an $\Theta(n\log n)$ algorithm: we sort $A[1\ldotd n]$ into $B[1\ldotd n]$, and break $B[1\ldotd n]$ into many pieces, whose elements' value are the same, therefore we can count the occurrence of each element.

I want to find a worst-case-$O(n)$-time-and-$O(n)$-space algorithm.

Supposing that $m=\Omega(n^{1+\epsilon})$ and $\epsilon>0$, therefore radix sort is not acceptable. $\DeclareMathOperator{\xor}{xor}$ Binary bitwise operations are acceptable, for example, $A[1]\xor A[2]$.

Yai0Phah
  • 621
  • 6
  • 14
  • Aryabhata's answer below shows that the general case is not good, but perhaps you have further restrictions available? A simple (but big) restriction would be to enforce that all the entries in the array are $O(n)$ in size. This would give a pretty trivial linear algorithm. – Luke Mathieson Jul 23 '12 at 04:02
  • 1
    @LukeMathieson: I deleted that answer, as I am not yet convinced that the paper I cited will work without any modification, and besides, OP seems to be interested only in the uniform cost RAM model. – Aryabhata Jul 23 '12 at 04:28
  • @Aryabhata: hehe, well the answer that's not there then! Out of interesting, and perhaps useful for Frank, what did you think was the problem with adapting the result in the paper? A quick skim suggested it applied, but I obviously didn't read into it. – Luke Mathieson Jul 23 '12 at 04:51
  • @LukeMathieson: The fact that the other elements need to appear an odd number of times in the current problem. Since, I skimmed over the proof too... – Aryabhata Jul 23 '12 at 05:36
  • It would be interesting if you are interested in theoretic results or in practical solutions. From the theory point of view, my first quick response is, that you can sort a list of integers faster than $O(n\log n)$. There is a deterministic algorithm by Han that runs in $O(\log\log n)$ time. For randomized algorithms, even better results are known, e.g. Han and Thorup have found a $O(n \sqrt{\log\log n})$ expected time algorithm. However, I think that your problem shouldn't require sorting. – A.Schulz Jul 23 '12 at 07:01
  • What's funny is, that if you look at the opposite setting (find the only odd occurrence), then you get the result by XORing all entries of $A$. – A.Schulz Jul 23 '12 at 09:47
  • @A.Schulz For that (opposite setting), I mentioned $\xor$ in my text. – Yai0Phah Jul 23 '12 at 11:24
  • Are the numbers themselves odd, or just the number of occurances? – Joe Jul 23 '12 at 17:51
  • @Joe They are arbitrary. – Yai0Phah Jul 24 '12 at 00:38
  • @Aryabhata: thank you very much for that citation. I don't think it can help at all in this case, because we are not looking for a majority element (the element in question could appear 2 times only, for instance), but it is interesting because the algorithm described in that paper is an (uncredited) ancestor of a generalization by Schenker et al. which is very well known. Thank you! – Jérémie Jul 24 '12 at 13:11
  • @Jérémie: Glad to have helped! btw, that paper deals with an element repeating more than $n/k$ times, including the case $k=n$. So it might be relevant, but might need some modification. – Aryabhata Jul 24 '12 at 17:18
  • @FrankScience: Are your numbers necessarily integers? Or could they be rationals? Or reals? Or elements of an abstract totally ordered set that supports $O(1)$-time comparisons? (If you do mean "integers", please change the question to read "integers" instead of "numbers".) – JeffE Jul 25 '12 at 01:03
  • @JeffE Well, integers, but any data structure whose length is invariable in computer could be considered as some integers. – Yai0Phah Jul 25 '12 at 03:16
  • @FrankScience: Sure, but if $A[i]$ is a rational number (for example), the inequality $0\le A[i] \le m$ doesn't tell you much about the number of bits in $A[i]$. – JeffE Jul 25 '12 at 23:06
  • @JeffE Oh, yes, therefore I edited my post. – Yai0Phah Jul 26 '12 at 02:46

2 Answers2

2

Here is an idea for a simple algorithm; just count all occurrences!

  1. Find $m = \max A$. -- time $\Theta(n)$
  2. "Allocate" array $C[0..m]$. -- time $O(1)$¹
  3. Iterate over $A$ and increase $C[x]$ by one whenever you find $A[\_]=x$. If $C[x]$ was $0$, add $x$ to a linear list $L$. -- time $\Theta(n)$
  4. Iterate over $L$ and find the element $x_e$ with $C[x_e]$ even. -- time $O(n)$.
  5. Return $x_e$.

All in all, this gives you a linear-time algorithm which may use (in the sense of allocating) lots of memory. Note that being able to random-access $C$ in constant time independently of $m$ is crucial here.

An additional $O(n)$ bound on space is more difficult with this approach; I don't know of any dictionary data-structure that offers $O(1)$ time lookup. You can use hash-tables for which here are implementations with $O(1 + k/n)$ expected lookup time ($n$ the table's size, $k$ the number of stored elements) so you can get arbitrarily good with linear space -- in expectation. If all values in $A$ map to the same hash value, you are screwed.


  1. On a RAM, this is implicitly done; all we need is the start position and maybe the end position.
Raphael
  • 72,336
  • 29
  • 179
  • 389
0

An almost trivial solution - which uses however $\Theta(n)$ space - is to use a hash map. Recall that a hash map has amortized runtime $\mathcal{O}(1)$ for adding and looking up elements.

Hence, we can use the following algorithm:

  1. Allocate a hash map $H$. Iterate over $A$. For each element $i \in A$, increase the number of occurences seen, i.e. $H(i)++$.

  2. Iterate through the key set of the hash map, and check which of the keys has an even count of occurences.

Now this is a simple algorithm which doesn't really use any large trick, but sometimes even this suffices. If not, you might want to specifiy what space restrictions you impose.

HdM
  • 868
  • 6
  • 12
  • I would still like to know, if there is a non-randomized $O(n)$ time algorithm using polynomial space. In particular, is there any theoretical evidence that finding the only even-occurring item is harder than finding the only odd-occurring item? – A.Schulz Jul 23 '12 at 11:01
  • @A.Schulz I think that it is the $O(n)$-expected-time algorithm by using hash table. I remember that somebody told me an $O(n)$-algorithm (or for some special case, say, odd=1 and even=2) maybe with stack, but I cannot recall it. – Yai0Phah Jul 23 '12 at 12:11
  • Not every hashtable implementation has this property; usually, lookup is not $O(1)$, not even amortized (afaik). In fact, a prior discussion has not yielded any implementation that has constant time lookup. Can you be more specific? – Raphael Jul 23 '12 at 15:21