2

An elegant program for a bitstring is the shortest program on a universal Turing machine that outputs this bitstring. According to Kolmogorov complexity, the length of the elegant program is independent of the Turing machine implementation.

Solomonoff induction uses the elegant program for a bitstring to predict the next digit. This is a universal prior given the minimal assumption the output is generated by a Turing machine.

We can use this insight to build a machine learning algorithm called a finite Solmonoff learner. The difference between a finite Solomonoff learner and the original Solmonoff induction algorithm is the finite learner does not have access to all elegant programs. There is no algorithm that can generate all the elegant programs, so the elegant programs the finite learner can use must be stored in memory. With a finite amount of memory, there is a limit to the elegant programs that can be stored and consequently used by the finite learner for prediction.

The limit exists because there are a finite number of elegant programs of a certain length L. When the bitstring of 1s becomes long enough, it is not possible for there to be an elegant program representing the bitstring to be of length L or shorter. If L is the amount of memory available, then eventually the elegant programs will all be longer than L, and none will fit in memory.

Now let's assume we have a very long string of 1s, and we remove one digit to make a prediction problem.

As a concrete example, our bitstring is:

11111111111111111111

We remove a 1 at random:

111111111111111_1111

The learner must figure out the most likely digit to go in the empty spot.

For a given amount of memory we can make the bitstring of 1s long enough that its elegant program cannot fit in memory. In this case, the finite Solomonoff learner will not be able to access the elegant program for the bitstring, and will thus be incapable of predicting the digit that goes in the empty slot.

To continue the example, assume the elegant program that generates the 1s is:

10001011

Furthermore, the amount of memory available is 7 bits. Consequently, the 8 bit elegant program cannot be stored in memory, and the finite learner cannot figure out what goes in the empty slot.

On the other hand, regardless of how long the bitstring of 1s becomes, a human will have no problem identifying the missing digit. A human has finite memory and cannot access all elegant programs to make predictions. Despite having the same handicap as a finite learner, the human can outperform the finite learner infinitely often.

Does this demonstrate Solomonoff learning is less powerful than human learning?

yters
  • 1,417
  • 2
  • 12
  • 21
  • 1
    If you mean that human can decide if string had shortened, I think there are limits too. String can be so long that you'll just forget on some step how large the number is. – rus9384 Aug 11 '17 at 21:59
  • " the elegant program for a bitstring" - What does that mean? Can you define your terms? 2. "For a given amount of memory we can make the bitstring of 1s long enough that its elegant program cannot fit in memory" - That doesn't sound right. Do you have a justification for that claim?
  • – D.W. Aug 12 '17 at 02:24
  • The human memory also should be defined. Technically it is limited, but also uses on-demand feature to extend long-term memory, and train very hard for very long time to extend short-term memory. So either it is restricted to memorized part in articulate loop or someone starts memorizing to put in the long term - this one is vast and extendible (hard to give limit). Also people use heuristics, I hope that doesn't counts as elegant program, otherwise not really possible to attempt. Maybe it could be settled, but people lose focus (say 48h of processing is crucible) and 45min tops to – Evil Aug 12 '17 at 04:35
  • start making errors. Some PDF for errors should be given. Of course not really taking into consideration pauses (sleep, eat, drink) which simply destroy short term memory and the long term-memory is not accessible for next 24h, and this also is prone to errors. Psychologicaly-wise, human will infinitely often make errors after exhausting focus, so here the answer is no. It is not possible to prevent subconcious processing, so human will cheat the assumptions, even trying not to. – Evil Aug 12 '17 at 04:40
  • Maybe this is not even in the right ballpark, but if machine gets 7bits of memory and the human is not restricted (so all my previous comments are irrelevant), isn't it wrong, simply because of different limits? Isn't the assumption of human making always perfect prediction too far-fetched (or let me ignore it, human stands simply for more powerful machine). Because in such setting the inferrence is flawed. – Evil Aug 12 '17 at 05:21
  • @Evil The 7 bit example is just to illustrate my meaning. In reality, we can set the memory limit to anything we want, and the finite learner will still be unable to match human performance with the bitstring of 1s. – yters Aug 12 '17 at 13:50
  • 1
    "According to Kolmogorov complexity, the length of the elegant program is independent of the Turing machine implementation." - That's not correct. You might want to study Kolmogorov complexity a bit more. What is true is that changing the universal Turing machine will change the length by only a constant... but if you care about constants, that might be important. – D.W. Aug 12 '17 at 17:37
  • @D.W. Yes, that's true. But I don't think that impacts this argument. Since the change is constant, then for long enough elegant programs the constant becomes negligible. – yters Aug 12 '17 at 18:20
  • 1
    Here is a guessing procedure that should be able to predict correctly in your concrete example. (0) Substitute 0 for the missing bit, and compress the resulting string, (1) Substitute 1 for the missing bit, and compress the resulting string, (2) Pick the bit which results in a smaller compressed string. – Yuval Filmus Aug 12 '17 at 18:44
  • @YuvalFilmus, yes or even a procedure that always predicts 1. But neither are Solomonoff induction. – yters Aug 12 '17 at 20:13
  • 1
    Neither is your algorithm. I'm suggesting an (apparently) better way to implement the key idea behind Solomonoff induction to the problem of prediction. – Yuval Filmus Aug 12 '17 at 20:15
  • @YuvalFilmus It's the most direct implementation of SI, though an analogous approach such as yours may work better. So, the argument shows that at least insofar as we try to directly duplicate SI with a finite system, human learning seems superior. Human learning may be equivalent to what you suggest, or your idea may be better, but human learning is at least better than directly implementing SI in a finite setting. – yters Aug 12 '17 at 20:47