Slow one-way pseudo-random permutation?

Question

I'm looking for a slow one-way pseudo-random permutation; or in other words a block cipher $E_K: P\in\{0,1\}^b\mapsto C\in\{0,1\}^b$ with moderate block size $b\approx 64$ bits, wide key $K$, indistinguishable from a random permutation for one not knowing $K$, and additional characteristics:

encryption $E_K$ is controllably slow according to a parameter $c$, similar to the iteration count parameter in PBKDF2; an additional memory size parameter $m$ as in Scrypt would be nice;
there is no efficient decryption method even knowing $K$, ideally so that the least costly method to decipher ciphertext blocks $C_j$ would be to try and (slowly) encipher candidate plaintext until finding a match. Update: Otherwise said, there should be a security gap as wide as possible between decryption and encryption, expressed as a ratio of work; and that should remain sizable when the cost of encryption is raised using parameter $c$.

Is there some established (or arguably secure) construct for that? One idea is outlined in this interesting answer to a related question (but unfortunately, the one-wayness obtained using a permutation polynomial seems dubious).

Update: I'd also be satisfied with a slightly expanding injective function, e.g. $E_K: P\in\{0,1\}^b\mapsto C\in\{0,1\}^{b+1}$.

Updated example application: 2-D barcode cards (or other kind of write-once memory cards) are issued to individuals with data including a unique serial number $S$ of $s=32$ bits (sequentially assigned, thus largely guessable from another $S$), and other data $Q_S$ (which we assume has sizable entropy). The serial number is used for some initial purpose, and (interpretations of) regulatory requirements protecting individual's privacy prevent from storing that identifier for purposes unrelated to the initial purpose. Nevertheless, merchants would like to lawfully reuse the same cards in an existing loyalty application, identifying cards with an identifier of small size, and able to work off-line (at least, as a backup). We thus want to transform $S$ and $Q_S$ into a short digest (practically public), computable from $S$, $Q_S$, and a key $K$, that remains unique to a given card (as $S$ is); the digest should leak no information about $S$ and $Q_S$ without access to $K$; and, inasmuch as possible, the confidentiality of $S$ and $Q_S$ should be preserved from an attacker knowing digest and key (thus easing the requirements on secure storage and use of the key). Notice that a truncated MAC would not match the "remains unique to a given card" requirement.

With a function as in the question, we could use as digest $E_K(S||R_S)$ where $R_S$ has $b-s$ bits, and is obtained, say using Scrypt, from $Q_S$, $S$, and $K$. Notice that encryption slowness in the forward/encryption direction is necessary to prevent checking guesses of $S$ and enumerated $R_S$; and even more slowness (ideally, next to $2^b$ times as much slowness) is required in the backward/decryption direction, in order to prevent deciphering of the digest. Notice that either attack would reveal $S$, and also allow forged cards misappropriating the same $S$ and digest as the original, if not the same $Q_S$.

@Dmitry Khovratovich: solving this reformulated, simpler question solves the former question, with perfect match of former (4), and the novelty that I explicitly want tunable encryption effort, and would like a memory size parameter in that. I'd also like assurance that the trapdoor has some argument/proof/prior research. — fgrieu, Dec 06 '13 at 10:18
Do not you have an option to explicitly test for collisions your function when the key is being fixed? Say, take $N$ iterations of HMAC, check for collisions for given key, if they are found then increase $N$ until the function is collision-free? You would need $\approx 2^{32}$ function calls and memory for each key. — Dmitry Khovratovich, Dec 06 '13 at 10:24
@Dmitry Khovratovich: Several problems with that later option. (A) Some $S||R_S$ (or $Q_S$) will be dynamically added during the life of the system, after the various $K$ and $N$ have been chosen. (B) If encryption is slow, then $2^{32}$ encryptions to choose $K$ and $N$ is impractical. (C) When $s$ is above $b/2$ (former $d/2$), working $K$ and $N$ become a rarity. — fgrieu, Dec 06 '13 at 10:38
Why do you need the output to be the same length as the input? If you're willing to have (say) a 256-bit output, your goals are easy to achieve; just use the SHA256 hash. If the output is going to be stored in a database but not stored on the card, that'd suffice. Also, why do you need it to be bijective (i.e., a permutation)? If collisions are non-trivial to find, is that good enough? — D.W., Dec 06 '13 at 18:09
Also, why does your application require such a complex construct? Why not just store a unique random serial number in the 2-D barcode, and have a back-end database that maps the serial number to whatever data is associated to that card (of course, the data can be encrypted under one merchant's key if you wish). Perhaps I haven't quite understood the application yet. — D.W., Dec 06 '13 at 18:11
@D.W.: The output has to be both short, so as to be human-enterable (as a backup) after being computed by a mobile app; and collision free with high likelihood for many inputs; see the original goals which I now hope to fulfill using the method in this question. Some of the problem is that the cards have already been issued with a certain serial number, but regulatory reasons prevent to store/reuse that serial number for other purposes than the original one, and I need to transform it into another identifier, with the transformation one-way. — fgrieu, Dec 06 '13 at 18:48
I still don't think I understand the requirements, but I suspect if you took a step back and allowed yourself to admit other solutions you could find better, simpler solutions. For instance, something using a trusted server or HSM. P.S. I think there's a solution to the mobile app issue you mentioned. Suppose there's a 256-bit SHA256 hash. You can ask the user to enter in the first 64 bits of the hash, then look it up in the database of items and see if that's to uniquely identify the item; if it isn't, have them enter in the rest. It will be extremely rare that a user needs to enter more. — D.W., Dec 06 '13 at 19:29

score 3 · Answer 1 · edited Apr 13 '17 at 12:48

The main difficulty in designing what I asked, is coming up with a permutation over a relatively small set, that is easy to compute in the forward direction, but convincingly much more difficult to invert. The least unsatisfactory I have so far is based on the discrete logarithm problem in $\mathbb Z_p$. If $p$ and $(p-1)/2$ are primes, and $g\in\{2\dots p-2\}$ with $g^{(p-1)/2}\not\equiv1\pmod p$, then $$P:\begin{cases} x\mapsto g^x\bmod p& \text{if } x\in\{1\dots p-1\}\\ 0\mapsto0 \end{cases}$$ is a permutation of $\{0\dots p-1\}$ which is easy to compute, but markedly harder to invert. At least, the problem is well-studied, with the best known algorithms the index calculus algorithm and some specialized GNFS variants. Bad news is that the problem is workable for 530-bit $p$, and here I'll be stuck with $p$ like 80-bit (for $b=64$ and other realistic parameters).

It seems possible to build a slow one-way pseudorandom permutation over $\{0,1\}^b$ from such $P$, while keeping about the same security gap (ratio of work in backward to forward direction) as in $P$. I sketch such a construction below. Again this is not fully satisfactory, for the above (family of) $P$ does not have a high-enough security gap, given the order of magnitude I can use for $p$. I badly need a better $P$ !!

Update: The best attacks on $P$ require a heavy pre-computation, then can go from $P(x)$ to $x$ at comparably low cost. I have tuned the following to account for that fact.

The construction requires an expected $c_0\cdot c_1$ evaluations of the above $P$ with $p\approx c_0\cdot2^b$, and forces an adversary trying to brute-force the forward direction to mobilize about $m\cdot2^m$ bits of memory for each few instances run in parallel. Realistic parameters could be $b=64$, $c_0=2^{12}$, $c_1=2^{12}$, $m=20$.

Derive from $K$ a pseudo-random $g$ in $\{2\dots c_0\cdot2^{b-1}\}$ ;
Derive from $K$ an arbitrary permutation $M$ of $\{0,1\}^m$, stored as $2^m$ words of $m$ bits of memory;
Repeat for $c_1$ rounds (identifier by round counter $r$) the following keyed permutation of the current $b$-bit state $s$:
- derive from $K$ and $r$ a pseudo-random prime $p$ in range $\big[{3\over4}\cdot c_0\cdot2^b\dots{5\over4}\cdot c_0\cdot2^b\big]$, such that $(p-1)/2$ is prime and $(g^{(p-1)/2}\bmod p)=p-1$ (note: it is advantageous to perform the later test right after insuring that $p$ and $(p-1)/2$ have no small divisors);
- Repeat
  - If $s<\big\lfloor p/2^m\big\rfloor\cdot2^m$ (which is, most of the time)
    - Replace the low $m$ bits of $s$ with their image through $M$;
  - If $s\ne0$ (which is, most of the time)
    - Replace $s$ with $g^s\bmod p$;
- Until $s<2^b$.

Note: $s$ temporarily grows to at most $b+\big\lceil\log_2({5\over4}\cdot c_0)\big\rceil$ bits.

For each round, the repeat..until iteration is expected to be executed about $c_0$ times, with an exponential distribution. Increasing the number of rounds $c_1$ makes the runtime more predictable, and less likely that pre-computations for inverting $P$ can help an attack. Each iteration requires one memory lookup, and $\epsilon\cdot\log_2 p$ multiplications $\bmod p$ (we can have $\epsilon\ll1$ with some pre-computation).

(Optionally, the construction could end (or/and start) with some conventional $b$-bit block cipher using key $K_d$ derived from key $K$, perhaps TDES for $b=64$ (with additionally one bit of $K_d$ devoted to making the resulting permutation odd or even, by conditionally swapping output values $0$ and $1$). This, combined with use in other steps of permutations not chosen from $K_d$, convincingly insures that the overall construction is indistinguishable from a random permutation for one not knowing $K$).

Why do you need to store the permutation as $2^m$ words of $m$ bits, rather than just using any short-block cipher on $m$-bit blocks? Also: what's the security gap you expect from this? (i.e., the ratio in workfactor to break vs the workfactor for the legitimate parties to compute this function.) My rough back-of-the-envelope estimate suggests you should expect a very small security gap. If we precompute the discrete log of all primes up to $2^{21}$ (about $2^{17}$) of them, the time to compute a single discrete log is about 200 smoothness tests (sieving + ECM on a $\le 84$-bit number). — D.W., Dec 09 '13 at 17:22
Indeed, the security gap that I can get with my $P$ is minimal (@D.W. has a better ballpark figure for it than I dare conjecture); the only satisfying aspect in my construction is that said security gap remains nearly stable with the number of rounds $c_1$, and decreases less than proportionally to $c_0$. If only we had an Elliptic Curve analog.. [justification for $m$ now in answer; improved the remaining of the present comment]. — fgrieu, Dec 11 '13 at 20:48

D.W. · Accepted Answer · 2013-12-07T20:42:44.680

2

Here's the best I can do. Let $\mathbb{G}$ be an elliptic curve group over a 64-bit prime. Define $f:\{0,1,2,\dots,2^{64}-1\} \to \mathbb{G}$ by $f(n) = ng$, where $g \in \mathbb{G}$ is a generator of order at least $2^{64}$. Notice that you can represent any group element in 65 bits, using point compression. Also, notice that $f$ is injective.

This gives you a one-way injective function $f$ that takes a few group operations to compute in the forward direction and about $2^{32}$ group operations to compute in the reverse direction. The function takes a 64-bit input and produces a 65-bit output, so it is not a one-way permutation, but it is close enough for your application (in your application, you just need the output of the function to be not much larger than the input and the function to be injective).

Now if you want to make it pseudorandom, you can use Dmitry Khovratovich's technique. Define

$$F(k,x) = E'(k_1,f(E(k_0,x)))$$

where $k_0,k_1$ are derived from $k$, where $E(k_0,\cdot)$ is a block cipher with a 64-bit block width, and $E'(k_1,\cdot)$ is a block cipher with a 65-bit block width. I don't know how to make this slow for the owner of the key $k$.

The problem with this is that an attacker who knows the key $k$ can still invert with $2^{32}$ operations, which is much faster than you were hoping for. Even worse, if the attacker wants to invert many such values, he can probably amortize his effort (the time per value inverted probably decreases as a function of the number of values to be inverted), thanks to the properties of the discrete log problem. Therefore, I suspect this likely won't be sufficient in practice.

In practice I suspect your best bet is going to be to use a trusted server or HSM at the back-end containing a database holding the desired mapping, and let the server enforce all of the restrictions you have in mind (e.g., about only being able to compute the mapping in one direction).

edited Dec 07 '13 at 20:42

answered Dec 06 '13 at 23:35

D.W.

36,365
13
102
187

One could, of course, alternate encryptions and applications of $\hspace{.04 in}f$. $\hspace{2.5 in}$ What other techniques are there to make $F$ slow? $:$ – Dec 07 '13 at 02:07
@RickyDemer, yup, either do that (i.e., iterate: a standard way to make something slow), or make the key expansion of $k \mapsto (k_0,k_1)$ slow (if each person will only use the key $k$ once -- which might not apply here, but I mention it just for completeness). – D.W. Dec 07 '13 at 03:41
1

If we alternate encryption and application of the same $f$, we no longer have an injection. To keep an injection, we would need 1 extra bit for each layer, thus I can have only few ones. Thus the one-wayness will plummet when I slow the forward direction. That would not occur if we had a one-way (aka trapdoor) true permutation at hand. – fgrieu Dec 07 '13 at 18:26
1

@fgrieu, excellent point. OK, I guess I don't know how to make this slow after all (unless each person will only use the key once -- but that's probably a relatively rare situation). Thank you for catching that. – D.W. Dec 07 '13 at 20:43
2

Maybe you could use supersingular curves to get something very close to a bijection? And perhaps you could use many such curves to make inversion more expensive? – K.G. Dec 09 '13 at 13:07
2

Let $p$ be a prime such that $p+1$ is twice a prime, $p-1$ is not divisible by $3$ and $-1$ is not a square. Consider the supersingular elliptic curve $Y^2 = X^3+1$ that has $p+1$ points. Fix some sign function $sg$ on the finite field. We can map any non-zero point $(x,y)$ to $sg(y)(x^3+1)$. Compose this map with the map $a \mapsto (a+1)P$ for some generator $P$, and you are done. – K.G. Dec 10 '13 at 07:44
1

@K.G., cool! Would you care to add that as a separate answer, so we can upvote it? Also, do you know anything about the security of the discrete log on such curves? Is it also the case that the best currently-known algorithm is a square-root algorithm (i.e., we don't know how to do better than the generic algorithms for square roots in a black-box group)? – D.W. Dec 10 '13 at 07:47
Supersingular curves map into a finite field of twice the size, so in principle you could do index calculus in that field. But I wouldn't know how fast index calculus in that field would be compared to baby-step giant-step or Pollard $\rho$ in the subgroup. – K.G. Dec 10 '13 at 07:58
@K.G.: The curve you suggest seems to be as in that answer to a question asking to map points on a curve to (a range of consecutive) integers. That answer says that the discrete log on this curve reduces to the discrete log on $GF(p^2)$. Is that correct? – fgrieu Dec 10 '13 at 09:22
1

The d.log. problem on the curve reduces to the d.log. problem in $GF(p^2). Inverting the map is probably more expensive than computing it in the first place, but I don't know by how much. And doing many inversions is much cheaper per inversion than doing one inversion. It may not work better than just using a finite field with appropriate maps. – K.G. Dec 10 '13 at 10:05
@K.G.: there must some typo in "Let $p$ be a prime such that $p+1$ is twice a prime, $p−1$ is not divisible by $3$", for only $p=3$ and $p=5$ are solutions. – fgrieu Dec 10 '13 at 14:35
1

Not a typo, just a mistake. Maybe six times a prime works... – K.G. Dec 10 '13 at 23:53
Years after, I realize that we can use Burt Kalisky's One-way permutations on elliptic curves to build a true OWP of a bitstring, then easily turned into a keyed PRP using symmetric crpyto. We can use the curve $y^2\equiv x^3+3x+2\pmod p$ with prime $p$ just below $2^b$, prime order $n$, prime twist order $2p+2-n$. For $b=64$, $p=2^{64}-\mathtt{0x2349}$, $n=2^{64}+\mathtt{0x258cfeb9}$. For $b=128$, $p=2^{128}-\mathtt{0x1C8E63}$, $n=2^{128}+\mathtt{0xd5f77b50974af7bf}$. – fgrieu Feb 06 '24 at 15:03

score -1 · Answer 3 · answered Dec 08 '13 at 10:58

I have a 128-bit block cipher that can be modified to fit your requirements (possibly).

It is an AES variant with a 16x16 matrix multiplication that operates on the entire state instead of the standard 4x4. ShiftRows is maintained but not necessary for diffusion. The matrix is its own self inverse. The s-box has been replaced with one that has a more complicated algebraic properties. The key schedule uses Keccak with c=1024,r=576, and generates round keys with a squeeze operation with 448 bits of output ignored. The key size and round counts are specified up to 512-bit keys.

In the standard form it is slow because of the large matrix multiplication, but in 32-bit and larger systems table lookups can be used (with about 64KB of memory for the tables) that combine the matrix with the sbox. The decryption process is simple, using the same "equivalent inverse" method that AES uses. The design can be shrunk to 64-bits using an 8x8 matrix, or extended to 256-bits using a 32x32 matrix.

In order to modify it to meet your requirements, additional rounds could be a parameter to slow it down. The ease of decryption part can be solved with using a non invertible MDS matrix. Additional slowdowns can be made by changing the round count of Keccak when generating round keys. Brute forcing a key with a 96-round Keccak based key schedule will be hundreds, if not thousands of times slower than AES.

Since this is a permutation, all unique inputs are guaranteed to produce unique outputs with a given key. With a non invertible matrix there should be no efficient decryption. With parametrized round counts and flexible key sizes, in addition to its design, it should be as slow as you want it to be on all platforms. The design should make its pseudo-randomness equal to or better than AES at a given round count (for a 128-bit block anyway). All of your requirements except for configurable memory requirements should be met.

Round function pseudocode modified since decryption not required

AddRoundKey 0
for i = 1 to Rounds
  Sbox(state)
  MatrixMultiply(state)
  AddRoundKey i
next i

16x16 MDS matrix modulo 0x11B (not tested for noninvertibility but sure looks it)

5C E8 9C 6C 77 0C 0E F7 47 79 5C CE A4 FA 80 B5
E8 2C 23 A9 08 70 5F 43 6B 15 84 29 32 5E DA 4E
9C 23 5F 41 87 FA 37 AF CA E4 79 24 85 42 2D DC
6C A9 41 EB C7 E0 BD E2 71 4E 27 93 9F 62 33 53
77 08 87 C7 48 15 1C 5B B5 6B B7 99 D3 52 D3 86
0C 70 FA E0 15 CE 24 72 B1 D4 DB 98 EE DA A2 5A
0E 5F 37 BD 1C 24 91 D5 75 35 A1 86 5A AF B3 87
F7 43 AF E2 5B 72 D5 6A 0E 54 AA 3D 6D CA F3 F7
47 6B CA 71 B5 B1 75 0E 86 8A 64 5E 44 EA AC DD
79 15 E4 4E 6B D4 35 54 8A 83 6E B3 CC F5 BD 49
5C 84 79 27 B7 DB A1 AA 64 6E 1D 2F 04 10 B8 74
CE 29 24 93 99 98 86 3D 5E B3 2F DA F1 DF B4 69
A4 32 85 9F D3 EE 5A 6D 44 CC 04 F1 3C 2D 87 6C
FA 5E 42 62 52 DA AF CA EA F5 10 DF 2D B6 41 62
80 DA 2D 33 D3 A2 B3 F3 AC BD B8 B4 87 41 7A D5
0E 1F EE BC C4 F7 48 81 F7 48 DE 4D 57 C4 A3 A4

I do not see how "The ease of decryption part can be solved with using a non invertible MDS matrix", while keeping MatrixMultiply(state) or/and the whole transformation a permutation. — fgrieu, Dec 08 '13 at 16:39
Right. This answer doesn't work. If the matrix is non-invertible, then this won't be a permutation. The original question asks for a one-way permutation. (If it didn't need to be a permutation, this question would be easy to solve: you could just use SHA256 truncated appropriately.) — D.W., Dec 08 '13 at 21:32
I was worried "non invertible MDS" would be something that doesnt exist, as if it is MDS it must be a permutation but also has an inverse (right?). How difficult is Keccak-f[200] to invert? A few of those can take the place of the matrix multiplication if the block size is extended to 25 bytes. — Richie Frame, Dec 09 '13 at 02:13
@RichieFrame Keccak's permutation is almost as easy to compute in the reverse direction as in the forward direction. — Paŭlo Ebermann, Dec 10 '13 at 09:31

Slow one-way pseudo-random permutation?

3 Answers3

Linked