In The Design of Rijndael page 39, It describe the design criteria for MixColumns step
- Dimensions. The transformation is a bricklayer transformation operatingon 4-byte columns.
- Linearity. The transformation is preferably linear over GF(2).
- LisDiffusion. The transformation has to have relevant diffusion power.
- Performance on 8-bit processors. The performance of the transformationon 8-bit processors has to be high.
However, it does not said why choose modulo $x^4+1$ operation in this step. I guess it is becasue it is the simplist polynomial operation otehr than modulo $x^4$, which will cause multiple zeros. I will glad if anyone tell me my guess is wrong.