3

The code

if(str1 == "abc") {}

can be converted to

if(hash(str1) == 0x8732e) {} // assume hash("abc") == 0x8732e

to obfuscate the code.

But they are not equivalent when hash collision occurs; e.g., another string xyz's hash value is the same as abc.

That is true theoretically. But is it an issue in real-life when the method above is used to obfuscate code? Is it a well accepted obfuscation method?

Edit: after a little research, I guess what I need is a one-way collision-free string transformation function, as I don't care the length of the output. Any suggestions? Any mature implementations?

e-sushi
  • 17,891
  • 12
  • 83
  • 229
Infinite
  • 133
  • 4

2 Answers2

6

If the inputs you're hashing aren't long enough with enough entropy, the hashes can be inverted by an adversary who is willing to brute force the entire range of possible inputs.

For example, hashing abc provides little barrier to those who are dedicated at deobfuscated your program, as it takes almost no time to calculate hashes for all possible 3 character strings.

The adversary can build a table of all inputs up to a certain size and proceed to crack your obfuscated data in negligible time. If your adversary has financial motivation, this probably will not protect your code.

You could opt for additional protection by using slow, salted hashes, which are usually employed to protect passwords. Using a salt can prevent your adversary from precomputing the table before he has access to your source code. The downside is that using a slow hash in an effective manner will cause a non negligible performance hit to your application.

Ella Rose
  • 19,603
  • 6
  • 53
  • 101
0

To avoid the table attack example Ella gave above^, you would need a hash function which is also a CSPRNG, in order to 'pad' the string. In this manner, an invertible bijection may be constructed where no two hashings are associated with the same string ie; a one way trapdoor function.

  • It'd be nice if you listed/linked which functions qualify as CSPRNGs. I read that one of the older SHA algo's qualifies as a CSPRNG. I tentatively assume that newer members of the SHA family will have the same properties, even though the underlying algos themselves have little in common. Specifically, does SHA3 qualify as a CSPRNG? Generally, how safe is it to assume that it's safe to drop a new-and-improved family member (SHA2,SHA3) in place of an older one (SHA1,SHA2)? – Mark Jun 07 '17 at 14:48