Limiting memory usage while keeping score

Question

For an assignment we are asked the following(simplified) :

A function F(a) has a non-negative integer as a result. This result is reused in the function itself. Like this, and where the first a is given:

new_a = (c1 * (previous_a or first_a) + c2) mod c3

Write, in C#, a sufficiently fast and memory efficient algorithm that tells how many iterations of the function does it take before new_a is within a specified range of any previous ones?

Now I have already made a "naive" solution as follows: I create an array of size c3 and for every new_a I check whether its corresponding index is set to true. If so, the number of iterations is returned. If not, the index and the given range around it is set to true. This goes on until there is a hit.
This solution is very fast but the downsize is that for large enough sizes of c3 there is just not enough memory available.

I have been thinking about a week now on how to solve the memory problem, but I just am stuck figuring out how to do it? The only thing I can think of is that given the function there is some kind of assumption I should/can make about the distribution of its outcomes but I just am stuck.

Any ideas on the direction I should take?

If we ignore the mention of C#, this question seems to be asking for an algorithm as much as a program so I don't think it's off-topic, here. — David Richerby, May 03 '15 at 15:15
to clarify: I am not asking for a complete algorithm, just an insight on how to tackle such a problem in general. I really have the feeling I am missing something obvious. — g_uint, May 03 '15 at 15:19
This exercise seems oddly familiar... However, you've simplified it too much and left out a very important detail. — Tom van der Zanden, May 03 '15 at 17:50
@TomvanderZanden I have no familiarity with this, never saw it before, and I wonder what you are hinting at. — babou, May 03 '15 at 19:40
@babou Your answer is pretty much spot on. The assignment specifies an upper bound on $c_3/r$. — Tom van der Zanden, May 03 '15 at 20:09
@TomvanderZanden you seem pretty knowledgeable, not just on the technique, but lso on the academic context :) Actually, I have little background in algorithmics (except for very specific areas) and I found it amusing, I means using a binary tree. — babou, May 03 '15 at 20:31
@TomvanderZanden The reason I simplified it so much is that I think that this equates to "having general discussion" about a problem in the assignments. Do you think the way the question is stated constitutes improper behaviour from me as a student? Me leaving out a very important detail is a combination of me not getting the assignment (which is why I asked the question in the first place) and not wanting to give too much detail because i am not into the fraud thing. — g_uint, May 03 '15 at 21:50
Full disclosure, I'm a (graduate) student at a university where a problem similar to this is used as an (undergraduate) programming assignment. @g_m I don't think it's improper, but I'm not the one to judge. But please note that I'm just being a bit cheeky and teasing you, I'm not part of the course organization and it's not my (or anyone's) job to figure out where students are getting help. In fact I think it's great that you've come to CS.SE to discuss this assignment! So long as you write all the code on your own and understand the concepts involved, it seems fair game. — Tom van der Zanden, May 03 '15 at 22:07
Thanks Tom, maybe my comment seemed a bit defensive, but somehow the smiley at the end fell off! I think we are om the same page here, but to be sure I will check with the teacher to see if this is ok or not. — g_uint, May 03 '15 at 22:13
@g_m It is a bit difficult for us too. We try not to spoil the teaching work of colleagues. If that can make you feel better, one reason I helped is that I trusted you, because you said up front it was an assignment. It is a lot better that the white lies we are being served all the time. I had the impression you were being serious and did need help. Style matters a lot. — babou, May 03 '15 at 22:23
@babou Totally get that, and thanks. I do really want to "get" it. Im following this course as part of a CS minor and it is so much fun, I sometimes question myself for not doing a CS major... — g_uint, May 04 '15 at 08:29

babou · Accepted Answer · 2015-05-03T19:53:35.507

Your idea of an array of size $c_3$ is a good one. But you can do better. First, since you need only be withing a range $r$, you can use an array of size $c_3/r$, rounding upward, and just mark the entry corresponding to the integer segment you fall in (there may be some details to work out).

Then, rather than using an array, if it is still too large, you can build a binary structure, so that you create only the array entries you need. Each node of the binary structure corresponds to a range of consecutive entries.

Then, you can reduce the memory requirement further by merging contiguous ranges where all entries have been hit, and rebalance the tree.
Of course, there is a $\log_2 c_3/r$ price in access time. But this cost is limited by the tree shrinking as it gets full.

score 1 · Answer 2 · answered May 04 '15 at 08:37

While babou offers one possible solution, that one depends on $c_3/r$ not being too large. If it is large, then you might approach the problem like this:

A data structure for solving this problem would need to provide us with 3 (or 5) operations: we need to be able to add numbers, given a number we need to find the predecessor (next smaller number) and the successor (next larger number).

One possible data structure that supports these operations is a binary search tree, and this gives a quite straightforward algorithm: generate the numbers $\{a_n\}$ one-by-one, and insert them in to the tree, and after each insertion query for the successor and predecessor values of the newly inserted number to see if they are in the specified range. If you use a balanced binary search tree (such as an AVL tree or red-black tree) each operation takes $O(\log n)$ and you obtain an $O(n \log n)$ algorithm where $n$ is the number of iterations before $\{a_n\}$ comes in to the specified range. Using a Van Emde Boas tree you could improve this to $O(n \log \log n)$. This approach uses only $O(n)$ memory so it is much more efficient than babou's approach if $c_3/r$ is large.

Another approach that is remarkably simple and might work very well in practice is based off insertion sort: if you could keep the elements of $\{a_n\}$ generated so far in a sorted array, given a new value $a_n$ you could find its position in the array using binary search and find the next larger and smaller values just by looking one position to the left and right. The problem with this approach is of course inserting new elements, since that requires expanding the array, and moving over all of the elements larger than the newly generated one one spot to the right.

However, you could store $n$ elements in an array of size $2n$, doubling up every element. You could still use binary search to find a new element's position, but inserting it becomes possible without extra work: you can simply have it take the place of a duplicated element. Of course, as the array gets fuller (with less duplicates) collisions might happen where you need to insert an element at a position where there are no more duplicates. This can be resolved in a similar manner to insertion sort, you can push elements to the right to make space for the newly generated element. Unlike insertion sort, this pushing stops much earlier: as soon as a duplicate element is encountered.

If the elements of $\{a_n\}$ are distributed somewhat evenly (which they are, by virtue of the way they are generated) then the chances of needing a long (and expensive) move step are small, and most insertions will be dealt with quickly. As the array becomes too full such collisions will become more frequent. When you get to the point where (say) the number of distinct elements stored becomes $1.5n$ you might double up the size of the array to $4n$. Even though this is a very expensive step (it requires moving all of the elements to a new array) it only happens very infrequently (since you double the size of the array each time, you only do this expensive step $O(\log n)$ times).

As I mentioned before, babou has already provided a hint in the right direction of how you can make the array smaller, this answer is just to offer a more complete solution to the problem (in case somebody ever comes along and needs guidance on how to solve this problem in a more practical setting without bounds on $c_3/r$). It's not the one you are intended to use, though if you managed to implement either one you would probably really impress the course staff and really get me in trouble :-)

This is also very helpful. I wish I could mark both as answers :) The part where you explicate thinking about the type of operations needed are exactly the kind of clues I am looking for. I'll dig in to Cormen et al now... — g_uint, May 04 '15 at 10:25

Limiting memory usage while keeping score

2 Answers2