Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage Encryption Privacy Security

Encrypted But Searchable Online Storage? 266

An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?
This discussion has been archived. No new comments can be posted.

Encrypted But Searchable Online Storage?

Comments Filter:
  • You want to... (Score:4, Insightful)

    by mhkohne ( 3854 ) on Thursday April 16, 2009 @04:25PM (#27602725) Homepage

    Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

    If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.

  • by qbzzt ( 11136 ) on Thursday April 16, 2009 @04:26PM (#27602729)

    You're missing something. SSL is for data that is in transit. The poster wants the data to be encrypted on the server. That's easy - any encryption program can do it. But then s/he also wants to search it. That is harder.

  • by 3p1ph4ny ( 835701 ) on Thursday April 16, 2009 @04:30PM (#27602793) Homepage

    No, this is not what SSL is for at all. SSL you have a party you wish to communicate with, but an insecure channel.

    Here, you don't want to communicate anything useful to anyone. This is more a privacy preserving data mining problem. It goes something like this:

    I have a long list of secret numbers 1...n. I do something to these numbers, so that Google doesn't know what they are, and then I send them to Google. Next, I want to know how many numbers are larger than, say k. So, I ask Google, but in a clever way, so that Google doesn't know what I'm asking.

    Google then tells me how many of my original numbers were larger than k. However, Google doesn't know my original numbers, and they don't know what question I asked. There needs to be some theoretical mapping that preserves this privacy, but still allows the data mining to occur.

  • by davidwr ( 791652 ) on Thursday April 16, 2009 @04:32PM (#27602827) Homepage Journal

    If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13 [wikipedia.org]-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.

    Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.

    If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.

    The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.

  • by deroby ( 568773 ) <deroby@yucom.be> on Thursday April 16, 2009 @04:32PM (#27602841)

    Yes you are =)

    SSL only encrypts the transport.

    It seems that the poster wants to have his data _stored_ in an encrypted way that is only decipherable by him, not by any of the machines/users at the storage facility. Yet, when he wants to do some search, he somehow expects the server to be able to do so... AFAIK that's not feasible.

    (you could store whatever encrypted stuff remotely, but querying will require fetching, reading and decrypting the (relevant portions of) data locally...)

  • by skathe ( 1504519 ) on Thursday April 16, 2009 @04:37PM (#27602915)
    ...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."
  • Re:huh? (Score:5, Insightful)

    by oldspewey ( 1303305 ) on Thursday April 16, 2009 @04:38PM (#27602955)

    Well that depends whether the OP wants to perform something like a fulltext search (i.e. the ability to look for keywords within the content of each document) or a metadata search.

    There's nothing to prevent you setting up a CMS where each piece of content is encrypted, but the metadata describing that content is out in the clear and searchable. Security in such a scenario would be less than optimal (e.g. people could guess certain things about your content based on the statistical pattern of length for each of the millions of encrypted content items), and of course you'd have to be very careful about the metadata fields and how you are populating them.

  • Re:Easy (Score:1, Insightful)

    by Anonymous Coward on Thursday April 16, 2009 @04:39PM (#27602965)

    RTFQ read the question again, please. With disk encryption the data would still be unencrypted in the server's RAM. The OP wants something much more sophisticated... data always encrypted in the server (HDD, RAM, CPU) but with the ability to search it. Not that easy to me.

  • by smallfries ( 601545 ) on Thursday April 16, 2009 @04:40PM (#27602973) Homepage

    I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?

  • by Anonymous Coward on Thursday April 16, 2009 @04:53PM (#27603255)

    This seems obvious impossible, but it isn't. The problem, of course, is in how the server can perform a search when it isn't even able to decrypt the message telling it to do a search.

    However, there is nothing inherently impossible in defining an encrypted datastructure and an algorithm where you can perform computations on the *encrypted* data, without having any idea about what it is you are computing. There is no reason that you need to decrypt data before you can do computations with it. It just needs to be the case that when you perform an operation on the encrypted data, some predictable other operation happens on the data inside the encryption. The result of this encrypted computation will then be something still encrypted, which can be sent to the client who can then decrypt it and find inside the result of his query.

    So it isn't obviously impossible. In fact the theory of multiparty computation makes it clearly possible, though the overhead of doing it that way would probably be too high.

  • by flaming error ( 1041742 ) on Thursday April 16, 2009 @05:00PM (#27603369) Journal
    > the amount of data you need to upload for the query is equal to the total amount stored
    That's not how I read it. But the approach still sounds useless:

    If Alice wants to search for the word W, she can tell Bob (the server) the word W and the ki corresponding to each location I in which W may occur

    What's the use of encrypting the data if you're going to send keywords in cleartext to a party you're trying to hide the data from?

  • by alanfairless ( 1420869 ) on Thursday April 16, 2009 @05:00PM (#27603371)
    And they can't search inside your documents.
  • by goodmanj ( 234846 ) on Thursday April 16, 2009 @05:34PM (#27603877)

    Can I have an anti-theft system for my car, so that nobody can steal it but anybody who wants to can take it for an anonymous test-drive?

  • by mcrbids ( 148650 ) on Thursday April 16, 2009 @05:35PM (#27603897) Journal

    You say 'SSL only encrypts the transport' as if that means something. What is a file if it's not a way to transport information from the file writer to the file reader?

    I use SSL daily to encrypt files with keys to be stored for later retrieval by the intended recipient. I think you are confusing SSL (the ability to assymetrically encrypt data) with HTTPS (a use of SSL to encrypt HTTP data transfers)?

  • Do this maybe? (Score:2, Insightful)

    by pentalive ( 449155 ) on Thursday April 16, 2009 @05:52PM (#27604101) Journal

    1 Encrypt the file (or record for databases)
    1.5 (for a database) Encrypt the key fields each separately
    2 Encrypt the file name separately
    3 store on server

    To search for a file:
    1 Encrypt the search criteria (file name or key value)
    2 search for encrypted thing on server
    3 Retrieve matches.

  • by ecevans ( 980167 ) on Thursday April 16, 2009 @06:23PM (#27604493)
    This is true that what you're describing would work, but you're talking about a translation, not an encryption. Using a good encryption scheme, all encrypted instances of a given string would not be the same.
  • by vidarh ( 309115 ) <vidar@hokstad.com> on Friday April 17, 2009 @02:01AM (#27608095) Homepage Journal
    Here's a search index: [1,55] [2,103] [2,178] [3,1] [3,2]. Give me all documents with a document id matching the second entry in each pair where the first entry is 2.

    Do you know which word "2" represents, or what is in documents 103 and 178?

    That's how you do it. You need to ensure there's no way of doing statistical analysis on the token list to recover plaintext info, and you need to not give them the dictionary mapping from plaintext to tokens.

  • by seifried ( 12921 ) on Friday April 17, 2009 @03:19AM (#27608427) Homepage
    And these tokens are generated how? Oh yeah. by Google's search engine. Whoops. If you want someone to extract information from data they will be definition be able to extract some amount of information from the data, even you have everything encrypted/etc. they could do frequency counts of the tokens and convert them to words, traffic analysis (can't encrypt the from/to). etc.

One man's constant is another man's variable. -- A.J. Perlis

Working...