Distinguishing Encrypted Data From Random Data?

Distinguishing Encrypted Data From Random Data? 467

Posted by timothy on Sunday September 19, 2010 @03:50PM from the is-this-a-solved-problem? dept.

gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"

Distinguishing Encrypted Data From Random Data?

This discussion has been archived. No new comments can be posted.

Search 467 Comments Log In/Create an Account

Comments Filter:

Re:It's all about entropy (Score:5, Informative)

by Omnifarious ( 11933 ) * writes: <eric-slash@omnif ... g minus language> on Sunday September 19, 2010 @04:05PM (#33629606) Homepage Journal

Doesn't compressed data look random?
As an ideal, yes. But compressed data is still pretty distinguishable from random data. In particular, many compression formats have small markers in various places so that the decompressor can attempt to recover a corrupted file. Also, no compression technique is perfect, so even without these the data is still distinguishable.

Shouldn't (Score:5, Informative)

by dachshund ( 300733 ) writes: on Sunday September 19, 2010 @04:13PM (#33629658)

AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).

Re:At the risk of doing someone's homework in a fo (Score:2, Informative)

by mysidia ( 191772 ) writes: on Sunday September 19, 2010 @04:17PM (#33629686)

Perhaps. But if you use cryptsetup with LUKS, there is a readable header for the encrypted file, you don't need the key to determine encryption has been used. In fact, you can set multiple passphrases that have the authority to decrypt the partition.
GPG Encrypted data is also distinguishable, regardless of whether you use ASCII armoring or binary .GPG files. There are headers in the encrypted output that can be recognized without having the key to decrypt anything.
Now if you run 'openssl' from the command line, and choose 'aes-256-cbc', supply a true random key, and enter data bits interspersed with random 'padding bits'. It will be probably impossible for anyone to determine from the output whether there are any data bits or not, without knowing the key.

crypto is hard (Score:4, Informative)

by Tom ( 822 ) writes: on Sunday September 19, 2010 @04:21PM (#33629710) Homepage Journal

Hard to say from your question, but if you haven't done already, get yourself some crypto knowledge. Crypto is hard, there is a reason that you are laughed out of the room if you say you've invented a new crypto algorithm and you don't already have strong credentials.
Randomness is one of the harder computer problems. Especially in steganography, many implementations have been defeated by creating not enough or too much randomness. If you want to hide your message in something, it doesn't matter if your output is distinguishable from randomness, it matters if it is distinguishable from what should be there. Simple approaches like LSB tricks have often fallen because those happen to be not random in many input data.

chaining is essential (Score:4, Informative)

by pedantic bore ( 740196 ) writes: on Sunday September 19, 2010 @04:32PM (#33629768)

If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Excellent example here: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29 [wikipedia.org]

Re:Shouldn't (Score:3, Informative)

by dachshund ( 300733 ) writes: on Sunday September 19, 2010 @04:36PM (#33629796)

Erp, I meant Pseudo-Random Permutation, which is indistinguishable from a PRF if the amount of data is realistic.

Re:At the risk of doing someone's homework in a fo (Score:3, Informative)

by butlerm ( 3112 ) writes: on Sunday September 19, 2010 @04:47PM (#33629858)

You cannot distinguish between the two.
This is categorically not true, unless the key is as long or longer than the data file (and never used again). There is indeed an attack vector against any encrypted data file if the key length is small by comparison. Statistical analysis plus the slightest idea of what type of data is being encrypted is more than adequate to mount a successful attack (given sufficient computational resources) unless the key is _much_ longer than what is typical today. The lack of computational resources is the only thing that keeps typical encrypted data secure.

Re:perhaps... (Score:3, Informative)

by Goaway ( 82658 ) writes: on Sunday September 19, 2010 @05:23PM (#33630038) Homepage

Exactly. If you do not already know the answer to this question, there is no way on earth you will write a program that is at all secure.
Back to the books and study.

Have a look at Rubberhose (Score:4, Informative)

by PhunkySchtuff ( 208108 ) writes: <kai&automatica,com,au> on Sunday September 19, 2010 @05:44PM (#33630158) Homepage

Rubberhose [iq.org] (Pronounced Marutukku) is transparently deniable encryption, developed by (among others) Julian Assange.
This seems to do exactly what you're trying to do, so even if you want to go ahead and implement it yourself from scratch, it's worth reading up on what they've done to get some ideas and avoid some potential pitfalls.
Rubberhose is a computer program which both transparently encrypts data on a storage device, such as a hard drive, and allows you to hide that encrypted data. Unlike conventional disk encryption systems, Rubberhose is the first successful, freely available, practical program of deniable cryptography in the world. It was released in an earlier form in 1997, but has undergone significant changes since that time. The design goal has been to make Rubberhose the most efficient conventional disk encryption system, while also offering the new feature of information hiding.
Rubberhose is a type of deniable cryptography package. Deniable cryptography gives a person not wanting to disclose the plaintext data corresponding to their encrypted material the ability to show that there is more than one interpretation of the encrypted data. What deniable crypto means in the Rubberhose context is this: if someone grabs your Rubberhose-encrypted hard drive, he or she will know there is encrypted material on it, but not how much -- thus allowing you to hide the existence of some of your data.

No, you ALL miss the point. (Score:3, Informative)

by Doctor O ( 549663 ) writes: on Sunday September 19, 2010 @05:51PM (#33630222) Homepage Journal

No, you ALL miss the point. How are you going to explain having a HDD or partition full of "garbage"? Nobody with half a brain will believe you there's nothing encrypted in the noise.
(Yeah, an entropy file would be easy to explain, but entropy files usually don't come in sizes big enough to hide data in, PLUS, who apart from us here understands what an entropy file is? A judge sure doesn't.)
Steganography, OTOH, would be very useful. I have around 50 GB of family photos on my machine, that would make for a nice data storage.

Re:Ignore the person holding the phone book. (Score:4, Informative)

by John Hasler ( 414242 ) writes: on Sunday September 19, 2010 @06:06PM (#33630310) Homepage

Hell, when your traveling between nations your in legal terms outside of all law...
Not true. You are subject to the jurisdiction of the nation of registry of your craft.

Re:Well (Score:4, Informative)

by swillden ( 191260 ) writes: <shawn-ds@willden.org> on Sunday September 19, 2010 @06:08PM (#33630324) Journal

DES is nowadays considered a weaker algorithm
DES is considered too weak for many uses due to its small key size.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.

Re:It's all about entropy (Score:4, Informative)

by linuxrocks123 ( 905424 ) writes: on Sunday September 19, 2010 @06:17PM (#33630392) Homepage Journal

You (and the submitter) might want to have a look at http://eprint.iacr.org/2009/531 [iacr.org] which talks about "known-key attacks" on "AES-like permutations". The goal of these types of attacks is to distinguish AES-encrypted data from random noise. Right now, all they can do is break 8 rounds of AES-128, so the answer to OP's question is "right now, AES-encrypted data is indistinguishable from noise".
---linuxrocks123

Re:No (Score:4, Informative)

by butlerm ( 3112 ) writes: on Sunday September 19, 2010 @06:22PM (#33630438)

There is no question is is computationally difficult, just not computationally impossible. The reason for that lies in computational complexity theory. You can get a basic summary of the theory here [wikipedia.org].
In summary, if the data being encrypted is compressible by practically any algorithm whatsoever, it has computational complexity less than its bit length, i.e. a smaller bit string can be used to recover the larger one. Likewise, the computational complexity of any encryption key is at most the length of the key in bits.
Suppose you are encrypting a 256K bit string that can be compressed by a factor of two by an _ideal_ compressor. And then you have a maximally random 1K bit key. The maximum Kolmogorov complexity of any finite deterministic function of those two inputs that is known to the attacker is 129 Kbits. Where the maximum computational complexity of a truly random string of the same length as the input is 256 Kbits.
The difference between those figures opens the door to a statistical attack, because the data is not _really_ random. It just looks that way, sort of like the output of a pseudo random number generator, which isn't really random at all. If you encrypt a string of zeroes with a significantly shorter key, the output of a pseudo random number generator is exactly what you will get, a pattern that is maximally vulnerable to attack.
The lesson to be learned here is if you want to minimize the risk of attacks from folks with far more computer power than you have available, compress it using the best available compression algorithm first. Then the computational complexity of the input string will approach the theoretical maximum for a string of that length, depending on how good the compressor is. Throughout I mean complexity relative to what the attacker knows (such as the encryption algorithm) of course.

Re:It's all about entropy (Score:2, Informative)

by Anonymous Coward writes: on Sunday September 19, 2010 @06:31PM (#33630508)

And on that note, I'm a little surprised now that I think about it, that I can't come up with a single example anywhere of a native or add-on OS feature for any OS, that does random-wipe-on-delete. OS X has "erase free space" built into disk utility
Open the finder then go to the finder preferences; click on the advanced button and check 'empty trash securely'.

Re:It's all about entropy (Score:4, Informative)

by xombo ( 628858 ) writes: on Sunday September 19, 2010 @06:38PM (#33630560)

"Secure Empty Trash overwrites your data with digital gibberish"
From apple's site about Secure Empty Trash feature @ http://www.apple.com/pro/tips/empty_trash.html [apple.com]

Use a random number testing suite (Score:1, Informative)

by Anonymous Coward writes: on Sunday September 19, 2010 @07:57PM (#33631026)

We've discussed this on MetaOptimize [metaoptimize.com].
Short answer: Download an empirical testing script like dieharder [duke.edu], and see if the encrypted output looks "random" under this battery of tests.

Don't reinvent the wheel (it won't work if you do) (Score:1, Informative)

by Anonymous Coward writes: on Sunday September 19, 2010 @08:55PM (#33631344)

In cryptography, reinventing the wheel is very bad - you usually end up with something that's broken in a way that isn't at all obvious. It's better to go with something that's been peer reviewed. In this case, the product you're looking for is TrueCrypt. TrueCrypt volumes look like random data, and you can have multiple encrypted volumes inside the same container. Each volume has either random data in its free space, or another volume that looks like random data in there. So you make one or more decoy volumes with your tax/banking information, non-incriminating diary, etc, and then put the thing you're really trying to hide in another volume. Since resizing TrueCrypt volumes is very inconvenient, you have a plausible reason for making it too big.
Also note that regardless of how well hidden your steganographic data is, the fact that you have steganography software installed (which you can't effectively hide if you want to use it) is enough to damn you if you're trying to make it look like you don't have any encrypted data.

Re:It's all about entropy (Score:4, Informative)

by chgros ( 690878 ) writes: <charles-henri@gros+slashdot.m4x@org> on Sunday September 19, 2010 @10:00PM (#33631710) Homepage

I suggest you read up on cryptography.
Encryption, in general, is attempting exactly what you're attempting: make plaintext look random.
What you're trying to defend against is known as a "known-plaintext attack".
You can use any standard cryptographic approach such as AES-CBC as suggested above.
For a password-based approach, there are also standard key generation algorithms such as PKCS #5.
Note that your claim that your approach gives "as random as it gets" data is not true; once you've fixed for all time a set of random numbers, they're no longer "random".
As for generating random-like numbers deterministically, that's what stream ciphers (e.g. RC4) do.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Distinguishing Encrypted Data From Random Data? 467

Distinguishing Encrypted Data From Random Data? More Login

Distinguishing Encrypted Data From Random Data?

Re:It's all about entropy (Score:5, Informative)

Shouldn't (Score:5, Informative)

Re:At the risk of doing someone's homework in a fo (Score:2, Informative)

crypto is hard (Score:4, Informative)

chaining is essential (Score:4, Informative)

Re:Shouldn't (Score:3, Informative)

Re:At the risk of doing someone's homework in a fo (Score:3, Informative)

Re:perhaps... (Score:3, Informative)

Have a look at Rubberhose (Score:4, Informative)

No, you ALL miss the point. (Score:3, Informative)

Re:Ignore the person holding the phone book. (Score:4, Informative)

Re:Well (Score:4, Informative)

Re:It's all about entropy (Score:4, Informative)

Re:No (Score:4, Informative)

Re:It's all about entropy (Score:2, Informative)

Re:It's all about entropy (Score:4, Informative)

Use a random number testing suite (Score:1, Informative)

Don't reinvent the wheel (it won't work if you do) (Score:1, Informative)

Re:It's all about entropy (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot