Encrypted But Searchable Online Storage? 266
An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?
You want to... (Score:4, Insightful)
Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.
If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.
Re:Am I missing something? (Score:5, Insightful)
You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.
Re:Am I missing something? (Score:5, Insightful)
No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.
Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:
I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.
Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.
It depends on the encryption (Score:3, Insightful)
If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13 [wikipedia.org]-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.
Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.
If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.
The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.
Re:Am I missing something? (Score:3, Insightful)
Yes you are =)
SSL only encrypts the transport.
It seems that the poster wants to have his data _stored_ in an encrypted way that is only decipherable by him, not by any of the machines/users at the storage facility. Yet, when he wants to do some search, he somehow expects the server to be able to do so... AFAIK that's not feasible.
(you could store whatever encrypted stuff remotely, but querying will require fetching, reading and decrypting the (relevant portions of) data locally...)
A guy walks into a bar... (Score:5, Insightful)
Re:huh? (Score:5, Insightful)
Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.
There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.
Re:Easy (Score:1, Insightful)
RTFQ read the question again, please. With disk encryption the data would still be unencrypted in the server's RAM. The OP wants something much more sophisticated... data always encrypted in the server (HDD, RAM, CPU) but with the ability to search it. Not that easy to me.
Re:It's not possible even in theory (Score:4, Insightful)
I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?
This seems obviously impossible but it isn't (Score:1, Insightful)
This seems obvious impossible, but it isn't. The problem, of course, is in how the server can perform a search when it isn't even able to decrypt the message telling it to do a search.
However, there is nothing inherently impossible in defining an encrypted datastructure and an algorithm where you can perform computations on the *encrypted* data, without having any idea about what it is you are computing. There is no reason that you need to decrypt data before you can do computations with it. It just needs to be the case that when you perform an operation on the encrypted data, some predictable other operation happens on the data inside the encryption. The result of this encrypted computation will then be something still encrypted, which can be sent to the client who can then decrypt it and find inside the result of his query.
So it isn't obviously impossible. In fact the theory of multiparty computation makes it clearly possible, though the overhead of doing it that way would probably be too high.
Re:It's not possible even in theory (Score:3, Insightful)
That's not how I read it. But the approach still sounds useless:
If Alice wants to search for the word W, she can tell Bob (the server) the word W and the ki corresponding to each location I in which W may occur
What's the use of encrypting the data if you're going to send keywords in cleartext to a party you're trying to hide the data from?
That's because they don't encrypt the filenames. (Score:2, Insightful)
Re:It's not possible even in theory (Score:4, Insightful)
Can I have an anti-theft system for my car, so that nobody can steal it but anybody who wants to can take it for an anonymous test-drive?
CONFIRMED: You are missing something. (Score:2, Insightful)
You say 'SSL only encrypts the transport' as if that means something. What is a file if it's not a way to transport information from the file writer to the file reader?
I use SSL daily to encrypt files with keys to be stored for later retrieval by the intended recipient. I think you are confusing SSL (the ability to assymetrically encrypt data) with HTTPS (a use of SSL to encrypt HTTP data transfers)?
Do this maybe? (Score:2, Insightful)
1 Encrypt the file (or record for databases)
1.5 (for a database) Encrypt the key fields each separately
2 Encrypt the file name separately
3 store on server
To search for a file:
1 Encrypt the search criteria (file name or key value)
2 search for encrypted thing on server
3 Retrieve matches.
Re:It's not possible even in theory (Score:2, Insightful)
Re:Am I missing something? (Score:3, Insightful)
Do you know which word "2" represents, or what is in documents 103 and 178?
That's how you do it. You need to ensure there's no way of doing statistical analysis on the token list to recover plaintext info, and you need to not give them the dictionary mapping from plaintext to tokens.
Re:Am I missing something? (Score:3, Insightful)