Is a finite Solomonoff learner worse than human learning?

Question

An elegant program for a bitstring is the shortest program on a universal Turing machine that outputs this bitstring. According to Kolmogorov complexity, the length of the elegant program is independent of the Turing machine implementation.

Solomonoff induction uses the elegant program for a bitstring to predict the next digit. This is a universal prior given the minimal assumption the output is generated by a Turing machine.

We can use this insight to build a machine learning algorithm called a finite Solmonoff learner. The difference between a finite Solomonoff learner and the original Solmonoff induction algorithm is the finite learner does not have access to all elegant programs. There is no algorithm that can generate all the elegant programs, so the elegant programs the finite learner can use must be stored in memory. With a finite amount of memory, there is a limit to the elegant programs that can be stored and consequently used by the finite learner for prediction.

The limit exists because there are a finite number of elegant programs of a certain length L. When the bitstring of 1s becomes long enough, it is not possible for there to be an elegant program representing the bitstring to be of length L or shorter. If L is the amount of memory available, then eventually the elegant programs will all be longer than L, and none will fit in memory.

Now let's assume we have a very long string of 1s, and we remove one digit to make a prediction problem.

As a concrete example, our bitstring is:

11111111111111111111

We remove a 1 at random:

111111111111111_1111

The learner must figure out the most likely digit to go in the empty spot.

For a given amount of memory we can make the bitstring of 1s long enough that its elegant program cannot fit in memory. In this case, the finite Solomonoff learner will not be able to access the elegant program for the bitstring, and will thus be incapable of predicting the digit that goes in the empty slot.

To continue the example, assume the elegant program that generates the 1s is:

10001011

Furthermore, the amount of memory available is 7 bits. Consequently, the 8 bit elegant program cannot be stored in memory, and the finite learner cannot figure out what goes in the empty slot.

On the other hand, regardless of how long the bitstring of 1s becomes, a human will have no problem identifying the missing digit. A human has finite memory and cannot access all elegant programs to make predictions. Despite having the same handicap as a finite learner, the human can outperform the finite learner infinitely often.

Does this demonstrate Solomonoff learning is less powerful than human learning?

If you mean that human can decide if string had shortened, I think there are limits too. String can be so long that you'll just forget on some step how large the number is. — rus9384, Aug 11 '17 at 21:59
The human memory also should be defined. Technically it is limited, but also uses on-demand feature to extend long-term memory, and train very hard for very long time to extend short-term memory. So either it is restricted to memorized part in articulate loop or someone starts memorizing to put in the long term - this one is vast and extendible (hard to give limit). Also people use heuristics, I hope that doesn't counts as elegant program, otherwise not really possible to attempt. Maybe it could be settled, but people lose focus (say 48h of processing is crucible) and 45min tops to — Evil, Aug 12 '17 at 04:35
start making errors. Some PDF for errors should be given. Of course not really taking into consideration pauses (sleep, eat, drink) which simply destroy short term memory and the long term-memory is not accessible for next 24h, and this also is prone to errors. Psychologicaly-wise, human will infinitely often make errors after exhausting focus, so here the answer is no. It is not possible to prevent subconcious processing, so human will cheat the assumptions, even trying not to. — Evil, Aug 12 '17 at 04:40
Maybe this is not even in the right ballpark, but if machine gets 7bits of memory and the human is not restricted (so all my previous comments are irrelevant), isn't it wrong, simply because of different limits? Isn't the assumption of human making always perfect prediction too far-fetched (or let me ignore it, human stands simply for more powerful machine). Because in such setting the inferrence is flawed. — Evil, Aug 12 '17 at 05:21
@Evil The 7 bit example is just to illustrate my meaning. In reality, we can set the memory limit to anything we want, and the finite learner will still be unable to match human performance with the bitstring of 1s. — yters, Aug 12 '17 at 13:50
"According to Kolmogorov complexity, the length of the elegant program is independent of the Turing machine implementation." - That's not correct. You might want to study Kolmogorov complexity a bit more. What is true is that changing the universal Turing machine will change the length by only a constant... but if you care about constants, that might be important. — D.W., Aug 12 '17 at 17:37
@D.W. Yes, that's true. But I don't think that impacts this argument. Since the change is constant, then for long enough elegant programs the constant becomes negligible. — yters, Aug 12 '17 at 18:20
Here is a guessing procedure that should be able to predict correctly in your concrete example. (0) Substitute 0 for the missing bit, and compress the resulting string, (1) Substitute 1 for the missing bit, and compress the resulting string, (2) Pick the bit which results in a smaller compressed string. — Yuval Filmus, Aug 12 '17 at 18:44
@YuvalFilmus, yes or even a procedure that always predicts 1. But neither are Solomonoff induction. — yters, Aug 12 '17 at 20:13
Neither is your algorithm. I'm suggesting an (apparently) better way to implement the key idea behind Solomonoff induction to the problem of prediction. — Yuval Filmus, Aug 12 '17 at 20:15
@YuvalFilmus It's the most direct implementation of SI, though an analogous approach such as yours may work better. So, the argument shows that at least insofar as we try to directly duplicate SI with a finite system, human learning seems superior. Human learning may be equivalent to what you suggest, or your idea may be better, but human learning is at least better than directly implementing SI in a finite setting. — yters, Aug 12 '17 at 20:47

D.W. · Accepted Answer · 2017-08-13T04:56:28.600

Your suppositions are wrong. A finite Solomonoff learner can learn the all-1's string. There does exist a learning algorithm that can output a program for the all-1's string.

Consequently, this example does not demonstrate that Solomonoff learners are necessarily worse than a human being. They might be; but this example doesn't prove that.

For instance, consider the following learning algorithm: if the inputs is all-1's, then output a program that only outputs 1's (e.g., 10001011 in your example), otherwise do something arbitrary. This is a trivial algorithm, but it demonstrates that there does exist a learning algorithm that can learn that sequence. In other words, you've given an example of an input sequence that both a human can learn and a finite Solomonoff learner can learn.

What you might have shown is that if we restrict the learner to a very limited amount of memory, then the learner might not be able to learn some things that a human can. But that's a rather uninteresting conclusion. Of course we know that if you severely restrict the memory of the computer, that limits what it can compute. For instance, if we limit the memory of the algorithm to only 7 bits, then the algorithm must act like a finite-state machine with at most 128 states, which severely limits the patterns it can find. A human can learn sequences that can't be recognized by a finite-state machine with 128 states. That's not surprising, and doesn't really have any interesting implications for whether finite Solomonoff learners are worse than human learning.

There's no point in comparing a human to a computer algorithm that has been massively handicapped. To make the comparison interesting, you must allow the computer algorithm to have reasonable access to the resources that any computer would have; otherwise you have trivialized the question. This means allowing the computer learner a non-trivial amount of memory. And once you do that, the learner no longer has any difficulties with your sequence.

There are other issues in your post. If you are assuming that the only possible approach is to try all possible algorithms until you find one that works, that's not the case. More seriously, the problem of "given a sequence, find the elegant program for it" is not computable, so if you require the learner to find the shortest possible program, you've created an impossible problem. Finally, where you say that "According to Kolmogorov complexity, the length of the elegant program is independent of the Turing machine implementation.", but that's not correct. What is true is that changing the universal Turing machine will change the length by only a constant... but if you care about constants, that might be important.

You say if the memory is reasonably large, then the finite learner can always find an algorithm that outputs the desired bitstring, such as all 1s. I think you mean a sort of dovetailing procedure that tries each program for a certain number of steps, gradually increasing the number of steps until one generates the desired output. Such a procedure is guaranteed to halt. But when it halts you do not know it has found the elegant program. So, you are not guaranteed to make the correct prediction. If I understand correctly, your counter example does not work. Otherwise, please clarify. — yters, Aug 13 '17 at 02:03
@yters, "You say if the memory is reasonably large, then the finite learner can always find an algorithm that outputs the desired bitstring" - I didn't say that. For the specific case of the all-1's sequence, yes, if the memory is not too small, there exists finite learns that can find a program that outputs the all-1's sequence. For the general case of an arbitrary input (an arbitrary bitstring), see the last paragraph of my answer, where I said that the problem is uncomputable. — D.W., Aug 13 '17 at 04:58
Yes, there are finite learners that can find programs that output all 1s. I am not arguing against that position, so it does not seem your answer is a response to my argument. The bitstring of 1s is a single example where human learning outperforms the finite implementation of Solomonoff induction, and there are others. The point is that there are patterned bitstrings that seem trivial for humans to recognize and predict empty slots regardless of length, but will become impossible for finite SI when the bitstring becomes long enough. — yters, Aug 13 '17 at 21:28
@yters, I'm not going to engage in a back-and-forth debate. In my answer I argue that the all-1's bitstring is not a case where human learning outperforms Solomonoff induction. — D.W., Aug 14 '17 at 01:19
Sorry, I'm not trying to debate, and I appreciate the time you've put into the answer, it is well thought out. I do not understand your answer, and am trying to understand. You say there is a finite SI learner that can always output the program that generates the 1s. It is always possible to output a program that generates the 1s, but the program may not be the elegant program. In using the non elegant 1s generating program, we are violating the SI algorithm to solve the particular problem. I do not understand how your answer addresses that issue. — yters, Aug 14 '17 at 02:15
@yters, as it happens, there does exist a finite SI learner that outputs an elegant program that generates the all-1's sequence. (But there's no SI learner that always outputs an elegant program for an arbitrary input sequence; that is not computable.) This now gets into semi-obscure aspects of computability/undecidability; see https://cs.stackexchange.com/q/367/755 for a similar example, or https://cs.stackexchange.com/q/6122/755. — D.W., Aug 14 '17 at 03:26
Ah, you are saying it is a decidable problem because the 1s is a single bitstring. I understand now. — yters, Aug 14 '17 at 16:15

Is a finite Solomonoff learner worse than human learning?

1 Answers1