Database : Chess Compiled 2019-2022 2500+ ELO
Methodology :
- I first passed it through kentdjb's pgn-extract to remove erroneous games from the db (you could restrict it by openings here) :
pgn-extract -o valid.pgn merged_2019_2022_25.pgn
- I then wrote a python script (using Python chess) that extracts what piece moved and the move from a game and puts it into a
pandas
DataFrame
and then to a CSV file
....
for move in game.mainline_moves():
move_info = {
'piece': board.piece_at(move.from_square).symbol().upper() if board.piece_at(move.from_square) is not None else None, # noqa: E501
'move': board.san(move).upper()
}
moves.append(move_info)
board.push(move)
df = pd.DataFrame(moves)
df.to_csv("moves.csv")
- Passing it back to a DF and counting it :
df = pd.read_csv("./moves_up.csv")
print(df.groupby('PIECE').count())
print(df.groupby('MOVE').count())
Results
Whole database :
Piece |
N moves |
Average |
P |
2072118 |
23.058 |
R |
1512889 |
16.835 |
N |
1403186 |
15.615 |
B |
1253473 |
13.949 |
K |
1109101 |
12.342 |
Q |
963072 |
10.717 |
Total |
8313839 |
|
Number of Games |
89864 |
|
Average number of half-moves in a game |
92.52 |
|
Extracted are the move themselves (5731 distinct entries, multiple pieces (i.e. QF6F5
), check/checkmate, and promotions are counted as their own entries):
Move |
Count |
O-O |
144493 |
NF6 |
101235 |
NF3 |
100404 |
D4 |
89831 |
D5 |
80752 |
E4 |
80105 |
NC3 |
77145 |
C4 |
72839 |
E5 |
72662 |
C5 |
71730 |
The most played in promotion is (A8=Q
) the 931st move overall, having been played 783 times, and the most played under promotion is actually E8=N+
(2667th, played 13 times)