1

I have an application that erasure codes data in chunks. Lets say there are 3 data chunks and 2 parity chunks (it could also be 6-3, 10-4 etc). Each chunk is 1MB of data and has a CRC32 checksum associated with it.

D1-CRC1 D2-CRC2 D3-CRC3 P1-CRC4 P2-CRC5

To recover lost data, I need any 3 of the above chunks to recover 1 or 2 missing pieces.

I am wondering if there is some way I can combine the 5 checksums in a space efficient way, so that given a recovered element:

D1'-CRC1'

And the previously combined checksum, I know for sure CRC1 == CRC1', without having access to CRC1 any more as it was lost?

I would then be able to store this "combined checksum" with each data element and then given apply some function to it along with the recovered checksum and know the recover is correct.

I thought XORing all the checksums together would help, but after thinking it through I don't believe it would.

The naive solution would be to store the checksum for all elements with each data element, but that gives a significant space overhead. I feel there is something better, but its beyond my knowledge!

  • 1
    So you are looking for some way to know for sure that when you only have three chunks remaining that any of the two you recover are correct? – James Sep 10 '22 at 03:25
  • 1
    This doesn't seem to be about cryptography / cryptographic checksums. That said, have you looked at PAR2 and can you indicate what you are missing from that algorithm / protocol? If that's not sufficient, please let me know by posting a comment, as I might need to migrate this to [cs.se]. – Maarten Bodewes Sep 10 '22 at 16:58
  • @james yea, if one or two are missing, I'd like to be able to validate the checksum of the recovered pieces matches the original checksum. – Stephen ODonnell Sep 12 '22 at 09:39
  • @MaartenBodewes - I wondered if there may be a crypto solution to this, but it may be more applicable to computer science. – Stephen ODonnell Sep 12 '22 at 09:39

0 Answers0