2

I'm encrypting with an FPE algorithm in C#. What I did is convert the date I want to encrypt to the day of year value and a year value like so.

string input = "1901-01-01";
DateTimeOffset parsedDateOffset = new DateTimeOffset();
bool parsed = false;
parsed = DateTimeOffset.TryParseExact(input, "yyyy-MM-dd", CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces, out parsedDateOffset);

if(parsed){

    DateTime dtNew = new DateTime(parsedDateOffset.Year, parsedDateOffset.Month, parsedDateOffset.Day);
    int dayOfYear = dtNew.DayOfYear;
    int yyyy = parsedDateOffset.Year;

// Then I encrypt the day of year separate from the year

    int encodedDay = FE11.Encrypt(365, dayOfYear, Encoding.UTF8.GetBytes(input.ToString()), Encoding.UTF8.GetBytes("0"));
    int encodedYear= FE11.Encrypt(2015 - 1901, yyyy, Encoding.UTF8.GetBytes(input.ToString()), Encoding.UTF8.GetBytes("0"));

// Then combine them together again taking into account leap year

     DateTime encodedDate = new DateTime((int)encodedYear, 1, 1).AddDays((int)encodedDay - 1);;
     if (DateTime.IsLeapYear((int)encodedYear))
     {
         encodedDate.AddDays(1);
     }   
}

However I'm encountering duplicate dates (i.e. for two plaintext dates result in the same encrypted date) But with format preserving encryption this should not be the case, and I know it is because of the way I am emcrypting the year separate from the day of year.

Does anyone have any suggestion to overcome this? I have tried introducing a tweak but the problem persists.

Ideally I would like to encrypt only one value. But encrypting the day of year alone still requires me to place it in the context of a year when presenting the final encrypted result.

Are there other date representation besides day of year that might be useful?

EDIT: The FPE library I am using at this time is called DotFPE located here https://dotfpe.codeplex.com/ (which is supposedly a port of the Format Preserving module of Botan Crypto)

erotavlas
  • 507
  • 3
  • 14
  • What FPE library are you using? Can you give us a link to it? – mikeazo Aug 06 '15 at 17:11
  • @mikeazo ok, added that detail at the end of my original post – erotavlas Aug 06 '15 at 17:24
  • Can you provide some sample outputs? Are the values duplicate right after the call to Encrypt or not until you call AddDays? – mikeazo Aug 06 '15 at 17:29
  • Also, when I look at the code, Encrypt takes 4 parameters, not 3 like you show in your example code. What am I missing? – mikeazo Aug 06 '15 at 17:32
  • @mikeazo sorry it was the key, I updated the post. BTW I'm using the original date as the key using Encoding.UTF8.GetBytes(input.ToString()) since I only care to encrypt one way, no need to decrypt. – erotavlas Aug 06 '15 at 17:47

2 Answers2

7

So, in the comments, you state that

I'm using the original date as the key

This is the reason for the duplicate dates. The encryption is entirely deterministic for a fixed key. When you change the key, however, it is not. So it is entirely plausible that encrypting the number 215 with the key 215 could result in the same ciphertext as encrypting the number 10 with the key 10. But, encrypting the numbers 215 and 10 each with the same key, say 111, should be guaranteed to result in different ciphertexts (assuming you are not using the tweak, or using a fixed tweak).

If you had a fixed, random key, then you should get different ciphertexts. Adding in a random, per-message "tweak" can also cause this behavior.

So you can fix this "problem" by just using the same key everywhere and no tweak. This does, however, scare me. You are using deterministic encryption in an application that has a fairly small domain. Could make you vulnerable to frequency attacks, or other attacks, depending on the exact application.

mikeazo
  • 38,563
  • 8
  • 112
  • 180
  • I probably wouldn't use a single key for everything, since my goal by changing the key for each input was to make it difficult to decipher. I think I'm going to try ponchos suggestion and use the Timespan in days since a start date, that way I encrypt only one value instead of two separate. – erotavlas Aug 06 '15 at 18:05
  • @erotavlas I really like his idea too and would recommend that. If you use the date as the key, however, you will still have a chance of getting duplicates. Depending on the application, that may not be a bad thing, however. – mikeazo Aug 06 '15 at 18:07
  • I'm getting almost the exact same number of duplicates (over 15000) using a Timespan in days as my initial method. When I cycle through every day from 1901-01-01 until 2015-12-31. Does this sound right? I expected every input value to result in a unique output value. My key is Encoding.UTF8.GetBytes(elapsedDays.ToString()) and my tweak is Encoding.UTF8.GetBytes("0")) and my modulus is days since 1901-01-01 – erotavlas Aug 06 '15 at 18:55
  • If you use the Encoding.UTF8.GetBytes("0")) as the key, do you get any duplicates? I'm not sure if 15000 is the expected number. I'd have to work out the math. – mikeazo Aug 06 '15 at 19:07
  • 1
    I don't think that over 15000 duplicates would be out of the ordinary. Here is some python code I wrote that generates 42003 random numbers between 0 and 42003 (inclusive) and counts how many duplicates there are. Running it, I was getting around 15500 duplicates. – mikeazo Aug 06 '15 at 19:17
  • @erotavlas: actually, there's no need to change the key for every date; that's what the tweak is for. – poncho Aug 06 '15 at 19:59
  • @poncho, my guess is erotavlas is trying to do this. – mikeazo Aug 06 '15 at 20:14
  • 1
    @erotavlas, I asked this question to get a better answer for you. – mikeazo Aug 06 '15 at 20:55
  • @mikeazo yes your guess is correct, I was aiming for something along the lines of the cryptographic hash that can preserve length and format. I think I understand your explanation of why I get duplicates. I'm just not sure what the correct way to solve the problem is and also produce a result that cryptographically secure. – erotavlas Aug 06 '15 at 22:29
  • @erotavlas this site would be a good place to ask. Make sure you ask about the higher level problem you are facing. Don't start with a possible solution. – mikeazo Aug 06 '15 at 22:51
  • @mikeazo I realized something else, I'm using this FPE to encrypt words and names according to this post here, but I'm doing the same thing with they key, using each input plaintext as a new key. I haven't run a test yet, but I'm guessing there would be collisions as well. However since the range of values possible is much larger and input strings change in length (modulus changes for word of different length) I'm guessing duplicates wouldn't occur as frequently as with this case of the dates? any thoughts? – erotavlas Aug 06 '15 at 23:02
  • 1
    @erotavias, the longer the input the better. – mikeazo Aug 06 '15 at 23:10
  • @mikeazo sorry one more question - What part of the algorithm inside the FPE results in this behaviour of collisions when you use different key each time? (if the answer would be too long, I can post a new question about it tomorrow) – erotavlas Aug 06 '15 at 23:15
  • 1
    @erotavlas it has little to do with FPE and everything to do with the output space. It is the birthday problem – mikeazo Aug 07 '15 at 00:24
3

One potential reason you might be running into duplicates is if you encrypt 'December 31' of a leap year; that makes it day 366, and that's out of range for your FPE method.

The obvious way to address this (and encrypt both days and years together) would be to convert the value into "days since January 1, 1901", encrypt that as a value between 0 and 42003 (the number of days between January 1, 1901 and December 31, 2015), and then convert that back into the standard format.

mikeazo
  • 38,563
  • 8
  • 112
  • 180
poncho
  • 147,019
  • 11
  • 229
  • 360
  • Actually I corrected for leap year. My modulus is set to 365 to yield an encrypted value between 0 - 364 (or 1 - 365) Once I obtain the day value, if the year value I use is a leap year according to DateTime.IsLeapYear() method I add an extra day to the result. Not sure if that is correct way, but it did seem to work. – erotavlas Aug 06 '15 at 17:52
  • 1
    @erotavlas, if you are always adding 1, doesn't that make January 1 of leap years impossible? – mikeazo Aug 06 '15 at 18:04
  • oops, yeah that's right. I'd need to change the modulus instead, have one encryption that gives leap years and another for regular years. – erotavlas Aug 06 '15 at 18:22
  • @erotavlas the nice thing about using this encoding is that you don't have to work around leap year corner cases. – mikeazo Aug 06 '15 at 19:33
  • 1
    And, it doesn't leak if two encrypted dates are from the same year; if the attacker sees 'Jan 23, 1932' and 'November 19, 1932', with your encoding, they don't know the year, but they can deduce both were from the same one. – poncho Aug 06 '15 at 19:38