I have an application that erasure codes data in chunks. Lets say there are 3 data chunks and 2 parity chunks (it could also be 6-3, 10-4 etc). Each chunk is 1MB of data and has a CRC32 checksum associated with it.
D1-CRC1 D2-CRC2 D3-CRC3 P1-CRC4 P2-CRC5
To recover lost data, I need any 3 of the above chunks to recover 1 or 2 missing pieces.
I am wondering if there is some way I can combine the 5 checksums in a space efficient way, so that given a recovered element:
D1'-CRC1'
And the previously combined checksum, I know for sure CRC1 == CRC1', without having access to CRC1 any more as it was lost?
I would then be able to store this "combined checksum" with each data element and then given apply some function to it along with the recovered checksum and know the recover is correct.
I thought XORing all the checksums together would help, but after thinking it through I don't believe it would.
The naive solution would be to store the checksum for all elements with each data element, but that gives a significant space overhead. I feel there is something better, but its beyond my knowledge!