10

History contain many example of old steganography. We also have digital steganography.

Is there any kind of “Perfect Steganography”, in a sense that only the designer can extract the concealed message?

Update Is there a (formal?) definition of 'perfect steganographic hiding' where only the key-holder / designer can provably recover the data and if so are there any known constructions satisfying it?

e-sushi
  • 17,891
  • 12
  • 83
  • 229
kelalaka
  • 48,443
  • 11
  • 116
  • 196
  • 2
    In steganography it's not about only the key holder extracting the message. Encryption is sufficient for that. We want only the key holder to be able to detect there is a hidden message at all. – Meir Maor Sep 17 '18 at 18:51
  • Yes. But the designer can share the method , right? – kelalaka Sep 17 '18 at 18:54
  • 3
    With modern steganography Kerchoff principal applies. method can be known and it should still be hard to determine if there is or isn't an embedded message without the key. – Meir Maor Sep 17 '18 at 19:12
  • 7
    Perfect Steganography would imply that the probability distribution of the object with the hidden message would be precisely the same as the probability distribution of the object without any steganography. That would be difficult to prove, as the probability distribution of most interesting objects (JPEGs, music files, etc) is not precisely defined – poncho Sep 17 '18 at 20:30
  • 2
    @poncho It really depends on the medium. For example, you could hide information in a recording of gamma ray bursts or in anything else that has white noise (or a known probability distribution) without making it possible to discover its presence short of knowing a secret key. Of course, that does mean that for the most common media, like image files, you often have to suffer with less-than-ideal steganography. – forest Sep 18 '18 at 03:50
  • 3
    @forest: I did say most interesting objects. If you have an object with a well defined distribution, then, yes, it becomes easy. However, I rather suspect that even (say) a recording of gamma ray bursts wouldn't follow precisely the naively expected distribution (for example, because the detector might become less sensitive immediately after recording an event) – poncho Sep 18 '18 at 13:23

2 Answers2

12

"Perfect Steganography" is not well defined. Whatever that is, it must obey Kerckhoff's principle: adversaries know all about the system, except non-public keys. That's enough to exclude many steganography systems in actual use.

I'll use the following largely standard terminology:

  • Payload is digital data one wants to transmit covertly; that's the question's "concealed message". Only its size is constrained.
  • Carrier is the data in which a payload might be concealed.
  • Channel is the vehicle by which said carrier is transmitted, and imposes a set of constraints on the carrier (like: being a bytestring of at most such size conforming to JPEG syntax, though perhaps with stricter constraints on padding and comment fields)
  • Original (if any) is data from which the carrier is prepared, with a set of constraints of its own (like, being the output of a consumer digital camera) and prescribed relation with the carrier (typically, a rendition of the carrier for human perception must be perceived as one for the original; or at least as a reduced-quality version of the original).

A steganography system must allow a sender holding a key (perhaps, public) and possibly a payload to construct a carrier; and a receiver with a key (same symmetric key, or private key matching public key) to determine if there was a payload in that carrier, and in the affirmative recover the payload.

An adversary's goal is to distinguish if a carrier carries a payload or not (better than random). The receiver must be extra careful not to leak that info, and that's hard, especially against active adversaries. We can assume that the adversary has the choice of payload within size constraints, much like modern cryptography assumes chosen plaintext.
Note: the question's goal "extract the concealed message" could mean that an adversary tries to reduce the size of the carrier while keeping the possibility to recover the payload when latter given the key, but that is not a usual goal, and I won't consider it further.

Payload confidentiality follows from security under chosen payload (argument: if the payload was intelligible, comparison with chosen payload would yield a distinguisher). It would anyway be trivial to add payload encryption on top of steganography if chosen payload was not assumed.

Often, extra properties are required:

  1. The original must be a valid carrier per the channel constraints.
  2. The security goal is met even if the adversary get holds of the original from which the carrier was prepared.
  3. When there's no payload in a carrier, the carrier must exactly match the original.
  4. The process by which an original is transformed into a carrier when no payload is embedded is constrained (e.g. is a certain pre-existing program).

Property 3 implies 1, and is incompatible with 2. There is overlap between channel constraints and constraints in 4.

Note: I've left aside watermarking, even though there is overlap with steganography.


The possibility (or not) of making a demonstrably secure steganography system depends heavily on the constraints set: on channel, original, payload size, and extra properties. There are many combinations of constraints depending on use case. Channel and original constraints vary immensely. Payload size can range from the text GO to terabytes.

The simplest demonstrably secure steganography system are those where constraints allow to embed a little more uniformly random bits in a carrier than the payload length (after compression). We can set these bits of the carrier either to true randomness (for no payload), or to the ciphertext of some authenticated encryption of the payload having the property that ciphertext is indistinguishable from random (which is easy and common). The receiver extracts the bits, perform the decryption/integrity check, concludes there's no payload if the integrity check fails, and otherwise has recovered the payload.

Such system must match whatever reasonable definition of perfect steganography is chosen (if a reusable key is not required, we can even use an information-theoretic MAC and replace encryption by a One-Time-Pad, to become information-theoretically secure regardless of the adversary's computing power).

Here is an example with properties 2 and 4, where original is any JPEG file, and channel is a PDF file with a PAdES digital signature made using 4096-bit RSA per RSASSA-PSS with SHA-512. I posit existence of a program that transforms JPEG files into signed PDF with said signature that uses a CSPRNG at step 4 of EMSA-PSS-ENCODE, reading:

Generate a random octet string salt of length sLen

In the context, sLen is 4096-64-512 = 3520 bits, which can legitimately be random per the channel's constraints and property 4. That is enough for a 64-bit MAC per HMAC-SHA-256, a 64-bit IV for the most significant bits of a counter for AES-256-CTR, and 424 bytes of payload (over 1k byte of compressed text), used instead of the random octet string when there is a payload.


When the channel allows no bit that can be arbitrary random, it becomes much more difficult to make a demonstrably secure steganography system. In particular, with property 3, it becomes critical that the adversary can't have a better model of the original than the system assumes.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
1

Is there a (formal?) definition of 'perfect steganographic hiding' where only the key-holder / designer can provably recover the data and if so are there any known constructions satisfying it?

Is it acceptable to post a link here to my own work? If not, then please feel free to delete this answer.

I believe that perfect steganography may be approachable and have an example implementation (MacOS only at present I am afraid). 'Perfect Steganography' post on my blog.

I think that the way the payload is woven into the jpeg makes it virtually impossible to detect the presence of the payload, but I am not sure how this could be made 'provable'.

In response to calls for proof, I have provided three images, one with no payload, one with a short message ("hello World"), and one with the first sentence of 'Lorem Ipsum'

http://vanleersum.uk/downloads/images/Drawing1.jpg http://vanleersum.uk/downloads/images/Drawing2.jpg http://vanleersum.uk/downloads/images/Drawing3.jpg

You will need my app to read the payload.... but maybe you can tell me which is which. I did need to run the 'untouched' image through my app as the optimisation in the tool is more aggressive. The original image was 240kB, but after running though my tool it was 188kB for all three. If this is unfair, then I will post the original too.....

In answer to the 'lack of detail' comment. What I am doing is subtly modifying the DCT data in a manner consistent with JPG standards and expectations, but doing so in a way that provides a signal to my tool. This signal is mapped to a byte array that carries the encrypted payload. Nothing in the way the DCT data is presented is unexpected - any compliant jpeg codec will read the image just fine. The image will have small deviations from the original, but these deviations will be equivalent to those experienced by using different implementations of DiscreteCosineTransform or different quantisation tables or different quality setting. All of which could occur from platform to platform or implementation to implementation. For reference, I am using jpeg-9c as my base code, and tinyAES for my crypto.

DrPhill
  • 193
  • 7
  • A short description, results will do the work. – kelalaka Dec 08 '18 at 20:59
  • Some before and after photos would be good to see if its any good. The sender and receiver would probably share the same private library of blurry photos, then the sender would run this custom jpeg compression algorithm and send in the open the result. – daniel Dec 08 '18 at 23:29
  • @kelalaka: There is a short description on my blog. I am not sure that this question on this forum is the right place to go into detail - but I will if there is sufficient support for this. – DrPhill Dec 09 '18 at 08:53
  • @daniel: For sure, claims need proof. And I will produce some images as examples later today. But how would you know that the images were before and after and not just images compressed with different settings? You would need to use the app to confirm the presence of the payload. There is also the problem that any recoding of the jpg WILL lose the payload. So in a way, if you view the images in your browser, the payload may well have been erased (though image distortion will remain, I guess). – DrPhill Dec 09 '18 at 09:00
  • @DrPhill I'd look for pixels where they shouldn't be, I'd look for file sizes larger than they needed to be, I'd try to look for when the jpeg didn't fit what it usually does with... math... https://www.youtube.com/watch?v=QEzhxP-pdos – daniel Dec 09 '18 at 10:07
  • 1
    I've read the blog post, but I think there is a quite a large gap between "perfect steganography and an implementation that tries to achieve that. First of all, perfect is a theoretical term, which you need to prove using math. Just pointing to an implementation of Steganography, even if well defined, doesn't cut it. Besides that, to count as an answer here, the methods used should at least be shortly described in the answer. – Maarten Bodewes Dec 09 '18 at 10:49
  • @daniel: I have added reference to images to the body of my answer, I hope this satisfies...... – DrPhill Dec 09 '18 at 11:24
  • @Maarten Bowdewes: I have added detail to the body of my answer, I hope this satisfies...... – DrPhill Dec 09 '18 at 11:25
  • 187,903 bytes 2. 187,880 bytes 3. 187,885 bytes, So i'd guess: paragraph, no payload, "hello World"
  • – daniel Dec 09 '18 at 11:29
  • It definitely satisfies the requirements for being an answer here on crypto, thanks for the edit. I doubt that "but these deviations will be equivalent to those experienced by using different implementations of DiscreteCosineTransform or different quantisation tables or different quality setting" is enough to satisfy the high requirements for it to be perfect - making it a good answer, but I'll leave that to the steganography experts on this site. – Maarten Bodewes Dec 09 '18 at 11:29
  • @daniel: No... and this is part of the beauty of this method. The adjustment to the quality is as likely to make the byte count go down as it is to go up. – DrPhill Dec 09 '18 at 11:43
  • @MaartenBodewes: thanks - being new here I have no feel for what constitutes adequate rigour. – DrPhill Dec 09 '18 at 11:44
  • Currently, the method is not described in full, and we do not have the source code (only executable). We need enough description to support reasoning, backed up by source code (without keys) for the pesky details. Independently: the challenge now is to detect if files that have gone thru the program embed an empty, small or large payload. Making such detection impossible is easy. But many actual adversaries are content to detect that the image has gone thru a program designed for stego. Making such detection impossible is harder, and a challenge testing that property requires different rules. – fgrieu Dec 09 '18 at 17:15
  • @fgrieu: you are correct. And I would be happy to discuss this in detail; but is the correct place to do so? The discussion requires familiarity with the details of jpg compression - so I would find setting the technical level of the discussion difficult. I posted here to point out a possible approach to 'perfect' steganography, and provided my current prototype as a form of supporting evidence. I did not intend to claim that I have achieved it! I cannot see how the technique that I use is amenable to rigorous mathematical proof (but that may be my ignorance of mathematical provability). – DrPhill Dec 09 '18 at 17:43
  • I do not think that an answer to the present question is the right place to advance on the subject. A new carefully stated question could be better. It should describe the steganographic goals, preferably with description of an experiment to test if a program meets these goals. Despite the complexity of the format, explicitly restricting to jpg files is OK in my opinion (the community will decide). You could state your current approach to these goals, as an illustration of what you attempted. You could ask if the goals are currently met, and how to reach them better. – fgrieu Dec 09 '18 at 19:34