How would you figure out someone's Elo rating from a single game of chess?

Question

I'm basically fishing for suggestions here... there is an AI contest to figure out a person's rating from a single game. The contest's description mentions Ken Regan's approach that makes use of engine evaluations of players' moves (and the contest's accompanying game data is pre-equipped with engine evaluations for such use). What other kinds of factors do you believe would be useful for such a task?

I could imagine things like:

Ability to follow recent opening theory (for strong players)
Total number of blunders (for weak players)
At what point in the game did resignation occur? (for strong players, it often evaluates to numbers closer to a draw than for weak players)
Average depth of tactical traps avoided

Other ideas?

possible duplicate of Is there a software to measure a player's strength based on a game? — Dag Oskar Madsen, Jan 07 '15 at 00:30
So I assume there would be a human with a known rating that is kept secret. The AIs would play a game then estimate the human's rating? Or, two players with secret ratings would play, and the AIs would attempt to estimate each?? — Tony Ennis, Jan 07 '15 at 00:57
@DagOskarMadsen, while I agree that this question (especially with its title) looks very much like that earlier one in many respects, I don't believe it is a duplicate. In particular, the earlier question asked for software that could do such a thing, and its accepted answer was about Ken Regan's algorithm based on engine evaluations. Here, tbischel's question links to a kaggle competition whose description already mentions Regan's approach and whose dataset comes equipped with engine evaluations for all moves. (cont'd) — ETD, Jan 07 '15 at 02:22
(cont'd) I see the present question as essentially asking how one can go beyond Regan's work, e.g. by complementing its approach with additional considerations, in order to do a better job of predicting Elo ratings. Perhaps it could use some rewording, but at least as I read it, it isn't a duplicate. (@tbischel, I have edited the question along the lines I described, but if I have altered your intent at all, please roll it back.) — ETD, Jan 07 '15 at 02:24
@ETD I see the previous question as essentially asking how you can do a good job of predicting Elo ratings, of which this is a duplicate. If the answer to the earlier question is not good enough, is it a good enough reason to ask the same question again? — JiK, Jan 07 '15 at 09:21
I'd also vote to close this question because it is primarily opinion based and subjective. If there is a good objective answer to the question, it has to have the required background research and cite the essential sources. But then that method is already going to be used in the Kaggle competition, and thus it is not an answer to this question. — JiK, Jan 07 '15 at 09:26
@Jik "If there is a good objective answer to the question, it has to have the required background research and cite the essential sources." Yes, or it could be an original contribution of such work. "But then that method is already going to be used in the Kaggle competition, and thus it is not an answer to this question." I don't see why something being able to be used in the competition would make it not an answer here. It's only the fact that Regan's approach is mentioned already in the posing of the referenced competition that I see this Q as asking for approaches beyond that. — ETD, Jan 12 '15 at 23:00

score 3 · Answer 1 · answered Jan 07 '15 at 10:05

I also try to take part in that competition, although I'm not sure I will be able to spare the time and computational effort to actually make a submission.

Unfortunately I think the Kaggle data set of engine evaluations isn't suitable to get the best possible result. I think it is crucial to not only have the evaluation of the best move, but also of the inferior choices. The reason is that you need to calculate some kind of measure of the complexity of the position to weight the accuracy shown by the players.

In short: It is hard to be accurate in a complex position and easy to be accurate in a simple position.

Another problem is that at a runtime of one second per move, stockfish probably isn't much stronger than the average player in the dataset (which would be around 2200 Elo) …

So to create a decent dataset one would have to invest a lot of cpu-hours. 50000 games x 40 moves x 1second (or more) >= 555 hours.

To create an opening tree that allows you to see when the players deviate from already played games and how strong the players of the played games were, is another idea to give an initial guess of the players strength.

So my rough outline is to create an opening tree to get an initial guess of the strength, ignore the moves played in the opening, weight the accuracy of the moves after the opening phase with the complexity of the position and do some kind of linear regression analysis. Maybe look out not to overvalue time trouble blunders.

I think the key is to get a good "complexity number". The other possibilities you mention … "number of blunders" just reduces "accuracy of play" to something similar with less information. "Depth of tactical traps avoided" is tricky and may computationally be even more expensive.

How would you figure out someone's Elo rating from a single game of chess?

1 Answers1