If you look at the algorithm description, you see that, at a high-level, the encryption algorithm looks like this:
addRoundKey(0);
for (int i = 1; i < rounds; i ++) {
subBytes();
shiftRows();
mixColumns();
addRoundKey(i);
}
subBytes();
shiftRows();
addRoundKey(rounds);
The decryption algorithm will then look like this:
addRoundKey(rounds);
for (i = 1; i < rounds; i ++) {
invShiftRows();
invSubBytes();
addRoundKey(rounds - i);
invMixColumns();
}
invShiftRows();
invSubBytes();
addRoundKey(0);
Now, if you view the 16-byte state as a vector in space (Z2)128, then the ShiftRows() and MixColumns() transforms are linear, and so is the addition of a round key (addRoundKey() -- really, a bitwise XOR). Moreover, the ShiftRows() is byte-oriented, therefore it can commute with SubBytes() (since SubBytes() works on each state byte independently). Therefore, it is customary for AES implementations to combine SubBytes(), ShiftRows() and MixColumns() into a single step with 8->32 lookup tables (you need four of them, for a total of 4 kB of tables). This optimization is what the Wikipedia page alludes to.
If you try to do that with decryption, in the loop above, you will see that the addRoundKey() step happens between invSubBytes() and invMixColumns(), which is inconvenient. The solution is to move the addRoundKey() one step down, but then the round key must be modified to account for the move (in simple words, if you add the round key after the invMixColumns(), then you must add a round key that has been already "invMixColumnsed"; it works because of linearity).
In that case, decryption becomes:
addRoundKey(rounds);
for (i = 1; i < rounds; i ++) {
invShiftRows();
invSubBytes();
invMixColumns();
addRoundKeyAlreadyInvMixed(rounds - i);
}
invShiftRows();
invSubBytes();
addRoundKey(0);
With that new algorithm layout, the invShiftRows(), invSubBytes() and invMixColumns() steps are again together, and can be optimized into a single set of lookup tables. However, this requires the "inner" subkeys (all but the first and last one) to be preprocessed with invMixColumns() so that the computations still work.
I believe this is what you observe in the implementation you are inspecting.