Distinguishing Encrypted Data From Random Data? 467
gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
No (Score:3, Interesting)
Wrong question to ask? (Score:3, Interesting)
Re:It's all about entropy (Score:5, Interesting)
make it compressed header-less audio. Give 'em a decoder (which will produce noise), and claim you're a scientist and this is you recording Jupiter.
Re:It's all about entropy (Score:3, Interesting)
Isn't the point of steganography that you add the encrypted data on top of some other data, like a photograph or video, so that it looks like normal noise? I agree that carrying a thumb drive around, filled with random 0s and 1s, would be rather suspicious....
Re:It's all about entropy (Score:3, Interesting)
Each implementation of the compression algorithm has its nuances. If the majority of an MP3 looks like it was compressed by the iTunes implementation, but then there's a range of output iTunes would not generate (particularly if the input file is known), that's very suspect.
Record same LPs to uncompressed audio files. That recording will be pretty unique. Encrypt your data any way you like, then store the encrypted data in the lowest bit of the 16 bit samples. Compress with Apple Lossless or FLAC or whatever you have.
Re:Ignore the person holding the phone book. (Score:3, Interesting)
That's why the deniability matters. They only have so many people available to whack people with the NYC Yellow Pages. You want them to believe there is a low probability that you have any secret to give up under "questioning".
Unfoilable Steganography in LSB Plane of Imagery (Score:5, Interesting)
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
Comment removed (Score:2, Interesting)
Actually, you've spotted the problem (Score:4, Interesting)
All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.
So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.
Statistics (Score:4, Interesting)
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.
If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).
At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.
Re:One more level... (Score:3, Interesting)
Presumably a simple XOR would make them be able to come up with that sentence... hell, any sentence thinkable in the world! "Look, if we apply these bytes, the secret message says [...]!"
Re:iieorjoeghoiuhtr (Score:2, Interesting)
Re:It's all about entropy (Score:5, Interesting)
I've been working on this very problem for a while now. An easier version, even: how to encrypt a single file in a way that makes it indistinguishable from random data? The algorithm must allow for a short password (dozens of bytes), and should be able to encrypt very large files. Optimally, an attacker may see the algorithm and may suspect correctly what the plaintext is, but should still be unable to prove that the given cyphertext is the output of the algorithm. That is, the only way to "prove" that should be by a brute-force password search, whereas finding a working password of a few dozen bytes is proof enough. This is good enough because a brute-force search over 60^30 passwords is kind of slow.
I further simplified the problem by saying that the size of a file needs not to be hidden: it's a separate task, and a much easier one.
I have a reason to approach the problem this way. If I have on my computer a file named "one-time-pad.bin", and it looks like a one time pad, then it must be a one time pad. The very existence of an encrypted partition should be enough to convince anyone that there is encrypted data. If a multi-sheaf algorithm is used, then there is a reasonable suspicion that there are multiple sheafs. Either way, the owner seems to be hiding something. Burying data in JPG and similar tricks are also sketchy, as it is almost certainly possible to distinguish (statistically) a benign JPG from the one steganographically altered, although this can be avoided by hiding very little data in very large files. Here, at least, there is an expensive solution.
I can think of at least one other way to do it, here goes my original description on the internet. Say, we want to use passwords with length up to B bits and encrypt files with length up to M bits. Fix forever B random binary strings of length M each, call them N = {n_1, n_2, ... , n_M}. The set of 2^B passwords is in a bijective correspondence with the set of subsets of N, for example a password like 110101... will select the subset {n_1, n_2, n_4, n_6, ...}. Treat n_i in that subset as integers and add them. Threat the plaintext as an integer and add it to (or XOR with) the result. One can think of it as of constructing a one time pad (one of 2^B) and XORing with it. Even if the attacker knows n_i for each i, and the plaintext (without loss of generality, all zero), and the cyphertext, she still has to decompose the cyphertext as a sum of a subset of N, and even deciding whether or not it can be done is np-hard [wikipedia.org]. The complexity will be exponential as long as both M and B are large, which they are in expected applications.
The nicest feature here is that with a non-trivial password, the cyphertext will look as random as they get! It will be a sum of carefully pre-selected random numbers, padded with the plaintext.
One obvious limitation is that each password can only be used once, since similar plaintexts will produce similar hypertexts, but that could be remedied. A bigger problem, IMHO, is that this algorithm requires B random binary strings of length M each to be built-in. Just to give you an idea, if you want to encrypt files of size up to 1 GiB with passwords of size up to 512 bits, then you need to keep around 512 GiB of pad. Either that, or be able to generate really really fast 512 random reals (random here meaning, the same every time, but completely unrelated), which is very sketchy: the reals could easily be so related that the subset sum will allow for a sub-exponential solution.
I would be very interested to hear from anyone about this idea.
I may have another way of solving the same hiding problem, and it has to do with a completely different, yet, IMHO, also very fascinating way of turning a short binary string into a very long and random-looking binary string in a one-way [wikipedia.org] fashion. I decided that I won't implement the subset sum solution unless I am totally sure that I cannot find something more elegant, so feel free to steal my idea above and code it in.
Re:It depends.... (Score:1, Interesting)
Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?
The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.
Cheers,
b&
Speaking of rubber hoses, the rubber hose file system [wikipedia.org], aka Maru Tukku [iq.org] hides data in the (potentially discontiguous) free space, and was done by Julian Assange (of wiki leaks fame I believe). The intent of this was to use plausible deniability, since there could be multiple hidden aspects (sort of like partitions), each with an encryption key. I think they were set up hierarchically, much like in onion routing, to increase resistance to losing a single key.
Re:It's all about entropy (Score:3, Interesting)
Re:Well (Score:3, Interesting)
There is no file with random data.
Why would one want to create a file with random data?
The point of creating a file is that it ain't random data what you are saving, otherwise you wouldn't save it.
group effort can provide plausible deniability (Score:4, Interesting)
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....
OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:
dd if=/dev/urandom of=random.bin bs=4096 count=10000
The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.
Re:It's all about entropy (Score:4, Interesting)
When they discover you aren't a scientist
The GP should record some LP, at 24 bits per sample, at 96 kSa/s, in stereo. It wouldn't be too unusual, especially if he picks a well known music. Classical music will be particularly good here. A typical opera, in .WAV, will be about 4 GB, and there will be at least 8 lower bits that are yours to play with (they are noise from the turntable.)
For the laptop at the border scenario (Score:1, Interesting)
wouldn't it be more useful to mod your laptop so it's got two drives, the goodies being on the drive that only becomes accessible when you put the little magnet in the right place to energize the reed relay that powers up the dark drive? No seeum drive in directory, no drive there, Kemo Sabe.
Re:One more level... (Score:4, Interesting)
Re:No, you ALL miss the point. (Score:3, Interesting)
Well of course you would want to *have* a cousin Jim who is willing to say he had given you a spare hard drive. Or replace "cousin Jim" with "friend Steve" or whoever. It's not *that* hard.
And you're going to rehearse the stories together? Now you're conspiring, so you've added another charge. And there are more points of failure, as all your stories have to match, which they won't.
Re:No, you ALL miss the point. (Score:1, Interesting)
Agree
If you have something suspicious, it will not be easy to hide.
If you have a HD with 80Gs of random data, that would obviously raise concerns.
If you want to have an encrypted volume it should be a sparse one, that takes the unclaimed space from other files. I.e. the unused bytes from innocent looking files that do not match the HD format cluster size.
I dare you to write a hidden FS, thus plausible deniability, from the waste space of host FS, and keep it protected.
Re:Shouldn't (Score:3, Interesting)
He dismisses LUKS and TrueCrypt because they don't offer plausible deniability - because of the headers, as you say. He then moves onto Linux's loopback device in crypto mode. It doesn't write headers. He then comes up with a technique of comparing the raw encrypted data with random text. Turns out using his techniques it is easy to spot the difference. And that is the point of the paper: even without headers or any other tell tale signs, there is no way to hide the fact that you have encrypted data on your disk.
From what I can tell, the paper makes several points:
1. Implementations that write headers leak info on what's encrypted
2. Poor crypto implementations (i.e., not using modes of operation correctly) leak info on what's encrypted
3. Encrypted data stands out when compared with non-random background
4. If you can look at the way a filesystem changes over time, you'll spot people writing encrypted data
I think (1) and (2) are fairly obvious and irrelevant --- if you do things wrong, then of course you'll get caught. I think we've addressed (3) already in this thread. So it remains for us both to agree that you can learn things by observing the way a filesystem changes over time. That doesn't exactly correspond to "using his techniques it is easy to spot the difference [between encrypted data and random text]". In fact, it sounds to me like a whole different flavor of attack, though certainly a powerful one that people should be careful of.
Here you are playing with words. "Encrypted data looks like random data" in this case means in this case "looks identical to the novice, but an expert will find it easy to distinguish the two". But no one would take that meaning from your post. It was poor communication at best.
I would argue that the attacker's expertise is irrelevant to this discussion. If an attacker can obtain periodic snapshots of a user's system, they don't have to be an expert --- they're just blessed with an unusually rich amount of data. No cryptographic technique (short of completely re-randomizing the noisy portions of the disk) is going to hide the fact that you're making suspicious changes to the data. I would further claim that, absent such a history, encrypted data does look like random data. But you are welcome to disagree.
It's been a good long time since I took part in a nice, angry Slashdot flame war, particularly in my PhD subject area. It's been fun.