Forgot your password?
typodupeerror
Data Storage Encryption Privacy Security

Encrypted But Searchable Online Storage? 266

Posted by timothy
from the give-some-to-that-lawyer dept.
An anonymous reader asks "Is there a solution for online storage of encrypted data providing encrypted search and similar functions over the encrypted data? Is there an API/software/solution or even some online storage company providing this? I don't like Google understanding all my unencrypted data, but I like that Google can search them when they are unencrypted. So I would like to have both: the online storage provider does not understand my data, but he can still help me with searching in them, and doing other useful stuff. I mean: I send to the remote server encrypted data and later an encrypted query (the server cannot decipher them), and the server sends me back a chunk of my encrypted data stored there — the result of my encrypted query. Or I ask for the directory structure of my encrypted data (somehow stored in my data too — like in a tar archive), and the server sends it back, without knowing that this encrypted chunk is the directory structure. I googled for this and found some papers, however no software and no online service providing this yet." Can anyone point to an available implementation?
This discussion has been archived. No new comments can be posted.

Encrypted But Searchable Online Storage?

Comments Filter:
  • by nahdude812 (88157) * on Thursday April 16, 2009 @04:25PM (#27602719) Homepage

    It's not possible to do this even in theory, unless you're relying on very weak encryption. The point of encryption is that you can't infer anything about the contents. If Google was able to infer enough to give you meaningful search results (if for example each word was encrypted by itself, and you searched for the encrypted version of the word), they would therefore necessarily be able to know enough to perform a frequency analysis attack on your data and compromise it in no time flat unless it was a very small amount of data (thus meaning search isn't really of value anyway).

    You'll find a similar problem plagues any attempt at searching. Searching requires a certain knowledge or meta knowledge of the material being searched; and that knowledge necessarily dramatically weakens your encryption.

    • by TheRaven64 (641858) on Thursday April 16, 2009 @04:28PM (#27602773) Journal

      It is possible. When you upload the data, you also upload an index. When you connect again, you download the index (which is much smaller than the data) and search that on the local machine. Neither the index, nor the data, is ever unencrypted on the server.

      As for frequency analysis, I don't think any encryption algorithms published in the last 40 years have been vulnerable to this sort of attack...

      • by jgtg32a (1173373)
        That's because all encryption produced in the last 40 years has been based off of Division not Addition
      • by TheRaven64 (641858) on Thursday April 16, 2009 @04:34PM (#27602861) Journal
        Replying to myself: the scheme in the linked paper is not feasible. It performs O(n) searches, but this means that the amount of data you need to upload for the query is equal to the total amount stored. Since most consumer Internet links are asymmetric, it would be cheaper and easier to simply download the entire data search locally. The paper proposes having a server-side cache. This means that, for a typical block cypher, you would have a cache of every search term encrypted for each block. The server could then compare this to each block, but would not know what the plaintext is. This is not useful in any real-world scenario. The cache would be orders of magnitude bigger than the stored data and the search would sill be O(n), which is painfully slow. As I suggested above, uploading an encrypted index with the data makes more sense. Look at Apache Lucene or Apple's SearchKit for how to do this.
        • Re: (Score:3, Insightful)

          > the amount of data you need to upload for the query is equal to the total amount stored
          That's not how I read it. But the approach still sounds useless:

          If Alice wants to search for the word W, she can tell Bob (the server) the word W and the ki corresponding to each location I in which W may occur

          What's the use of encrypting the data if you're going to send keywords in cleartext to a party you're trying to hide the data from?

      • by FredFredrickson (1177871) * on Thursday April 16, 2009 @04:34PM (#27602865) Homepage Journal
        Mozy does this for personal/business backups. You can use a completely private key, but search your own data.
      • Re: (Score:3, Informative)

        by BitZtream (692029)

        And that would practically defeat the purpose of the encryption.

        For the index to be useful it has to provide too much information about the encrypted data. The point of encryption is to ensure that nothing can be inferred about the contents of the encrypted data. If you give them a nice big bunch of information about whats encrypted, why bother encrypting it in the first place?

        Given enough information in the index they could actually derive your encryption key as well with some simply brute forcing.

        • not hard (Score:2, Interesting)

          by zogger (617870)

          Just use a book (or multiple books) code cipher for your index. You don't need to remember a thing beyond which books and what your key starting number is, the pattern. And if someone is in your house throwing all your books at cracking the remote server, you are already screwed and have much bigger problems, such as they probably already installed a keylogger on you. If you are that much of a target for someone to take that much interest....time for plan B or C then, involving plastic surgery, new ID and s

        • I assume FredFredrickson meant that the index would be encrypted.
        • by aztektum (170569)

          An encrypted file filled with financial information, locked up and merely tagged "financial information" would only tell you what sort of data is in there. If you can't view the data, no harm no foul?

      • The only thing that I know is that it's gonna somehow involve Locate32 http://www.locate32.net/ [locate32.net]

      • by fwr (69372)
        That is not what was originally requested. What was originally requested was to have the provider do the search. Uploading keywords that the provider can search, or any other sort of index, is also not what was requested. What was requested was to have the provider search the actual data, not some cleartext index or keywords attached to the encrypted data. Your solution also makes little to no sense to upload the encrypted index if you are just going to download it again to search it locally. You may a
      • What you describe sounds like a viable alternative. However, the OP question involved the service doing the search, which should send off all kinds of warning bells - that would require unencrypted access for the server to at least part of the data.
    • Unless you do the indexing client-side, and upload an index that's somehow encrypted...

      I'm not saying I know how to do this, but it seems possible.

      • Re:Maybe, maybe not (Score:4, Interesting)

        by The Moof (859402) on Thursday April 16, 2009 @04:42PM (#27603035)

        Maybe something like this -

        Create an index of hashes using the unencrypted data on the client.
        Encrypt the data on the client so we now have an index of hashes that apply to an encrypted file.
        Upload the hash index and the encrypted data file to the server.
        To search, hash the search criteria on the client.
        Server search the indexes for the hash value, returning a list of encrypted files with an index matching the criteria hash.

        • by MrEricSir (398214)

          Sounds good to me. You wouldn't be able to get a "ransom note" but I guess that's an acceptable limitation.

    • by blueg3 (192743)

      Not possible in theory? You should tell the authors of the linked paper that describe how to do it in theory.

      • The algorithm in the linked paper requires you to upload at least as much data as is stored remotely for every search query. This is technically possible, but it would be cheaper and easier to download and decrypt all of the data locally then run all of your searches, which seems to defeat the point. The only occasion when their algorithm makes sense is when you are repeatedly searching for the same terms, but if you're doing that then you should just save your search results.
        • by Homburg (213427)

          That's true for their Scheme I, but I don't think it's true for Scheme II, or any of the subsequent schemes, is it? Scheme II and all subsequent schemes make the key for any word a function of that word, so, to search for a word, you just need to upload the word and its related key. I don't see why that would be anything like as much data as is stored remotely.

          Now, the idea of making the key used to encrypt a given word a function of that word kind of sounds insecure to me, but I don't have the cryptography

          • Their subsequent schemes appear to rely on using asymmetric encryption (e.g. RSA) where you can provide the server with the public key and have it be able to encrypt, but not decrypt, data. Asymmetric encryption is massively more computationally expensive than symmetric, which is why it is never used for this kind of application.
    • Yeah, I'm not sure I understand how meaningful searches can be done without decryption-- but then I don't pretend to be any kind of a genius about these things. It seems much more likely to me that there could be some kind of a system where unencrypted search indexes are kept locally while the files are encrypted and sent to an online storage service. Then you could search locally for the file you're looking for, fetch the encrypted information from the online storage, and then decrypt it locally.

      That so

    • by smallfries (601545) on Thursday April 16, 2009 @04:40PM (#27602973) Homepage

      I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?

      • Re: (Score:2, Informative)

        by Anonymous Coward

        "why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?"

        Because it can't. The one paper proposes (unless I'm missing something!) giving the server the word to search for AND the keys! The security is by frequently rotating the key, and if you KNOW you only wanted to search, say, chapter 1 of a longer document, only give the key for chapter 1. Not very secure!

      • Re: (Score:3, Interesting)

        by raju1kabir (251972)

        I'm curious - why would you post a comment claiming that this can't even be done in theory, when the submitter included links in the summary to a paper that shows that it can?

        Because the paper doesn't propose any solution that is practical, or which even leads to a practical solution.

        In theory I can cure all forms of cancer - all I have to do is go through each cell in the victim's body and pluck out the cancerous ones.

    • You could assign tags to the meta headers of the encrypted file, that can be grouped into sub categories, hence some file that says I am encrypted but I can vouch that I am an image, could prove useful

    • There are encryption algorithms that allow addition. That is, the sum of two encrypted messages is an encryption of the sum. I've forgotten how these work exactly, I think they are some many to one mapping, and the addition operation is not simply adding the encrytped numerical representations.

      I came across these when looking at voting systems that allow N distributed people to vote in a way that sums the result before it is decrypted rather than decrypting to do the sum.

      Anyhow what this means is that is

    • by goodmanj (234846) on Thursday April 16, 2009 @05:34PM (#27603877)

      Can I have an anti-theft system for my car, so that nobody can steal it but anybody who wants to can take it for an anonymous test-drive?

    • In theory, I can think of one way where this would actually work. I could be wrong though.

      I don't speak, read, or understand Russian, at all. But if you gave me a sheet that had Cyrillic text, and gave me a request to retrieve a phrase or a portion of the text, I could probably do it given enough time by matching the characters exactly, but won't have any idea what I'm reading. I'm not translating anything, just retrieving a portion of the text based on a character map.

      Of course, I'm not sure if this analog

      • Re: (Score:2, Insightful)

        by ecevans (980167)
        This is true that what you're describing would work, but you're talking about a translation, not an encryption. Using a good encryption scheme, all encrypted instances of a given string would not be the same.
        • I figured what I described was too good to be true. :) In that context, I can see how this would differ with encryption. Thanks for the clarification.

  • You want to... (Score:4, Insightful)

    by mhkohne (3854) on Thursday April 16, 2009 @04:25PM (#27602725) Homepage

    Use an encrypted query to match against the encrypted text. The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

    If it didn't, then an attacker who got hold of the encrypted text and some of your encrypted queries might well be able to mount an attack based on commonalities between the two.

    • by noidentity (188756) on Thursday April 16, 2009 @05:27PM (#27603777)

      The problem is, if the text is REALLY encrypted, then there shouldn't be enough information to do this - the encrypting of the original text should make it impossible to even match against it.

      NOT TRUE! I use a combination of XOR and rot-13 encryption and I'm able to do text searches just fine. The trick is to encrypt the search string, then it'll work perfectly. This is because the encryption doesn't depend on the position within the text, but that shouldn't hurt security too much.

      • Re: (Score:2, Informative)

        by Thad Zurich (1376269)
        ROT13 is encoding, not encryption. You transform the information, but you don't conceal any of it. Ziad El Bizri (OP cit.) apparently observes that if you encrypt the keywords individually, then you can submit encrypted keyword queries, and the server can search for them for you. This is great, but why would you want to? The object of a search server is for other people to be able to search the data (otherwise why index it on the server?) With the suggested scheme, only the data owner (or shared key holder
  • by AchiIIe (974900)

    This sounds pretty easy,
    a) obtain database, indexing tools, search tool
    b) install on the machine and encrypt the entire hard drive with any of the many available whole-disk encryption tools
    c) ssh in and run queries.

  • Just to clarify the OP's idea. They want to store only encrypted data on the server, send only encrypted queries to the server(that the server can't even decrypt), yet they expect that the server will be able to send them back results. I don't think it can happen but surprise me.

    The best I think you can do is store and transfer the data in encrypted form and put the indexes and any search logic on the client. Maybe the index could be stored on the server as well and synced to the client, but creating the
  • by davidwr (791652) on Thursday April 16, 2009 @04:32PM (#27602827) Homepage Journal

    If the data is encrypted in independent "chunks" from which search terms can be built then this is trivial: You pre-encrypt your search terms and search for them. Searching a word ROT13 [wikipedia.org]-encoded document works this way, as each character is encrypted individually and an encrypted search term is made up of encrypted characters.

    Once you get past this, it's no longer easy. You basically have to either make the term you are searching for look like all possible values of the encrypted text and return all matches, or decrypt the document somewhere.

    If the encryption is good and any particular chunk, extract, or other slicing-and-dicing of the encrypted data without the key looks random, you are pretty much stuck with decrypting it somewhere.

    The alternative is to store an index, or at least a list of keywords, in clear text. For example, a document describing how to build a nuclear bomb could have a list of 10 or 20 non-classified keywords attached to it to aid searching. But that's not what you are asking for.

  • There are techniques to do this but none have made it out of academia. Most are quite inefficient and support very restricted querying models. Here's one paper that claims their methods are "practical" (but always keep in mind that academic claims of practicality should always be taken with a grain of salt):

    http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf [berkeley.edu]

  • It's been done. GNUnet [gnunet.org].

  • by skathe (1504519) on Thursday April 16, 2009 @04:37PM (#27602915)
    ...and when the bartender asks him what he would like to drink, the guy says "I want what I always get, but I don't want you to actually pour the drink, just help me search behind the bar for the liquor I want, and the hand it to me without seeing what it actaully is, and charge me correctly without any knowledge of what it is you just helped me find."
    • Re: (Score:3, Interesting)

      by richie2000 (159732)

      But... That's not a valid car analogy since you're not allowed to drink and drive.

    • by HTH NE1 (675604) on Thursday April 16, 2009 @04:59PM (#27603363)

      Not good enough. The bartender could audit his liquor to see how much of each bottle was dispensed.

      This is why when they do this sort of thing, the gentleman just serves the bartender a National Security Letter and takes more than what he wants without paying a dime.

    • Re: (Score:3, Interesting)

      by dimeglio (456244)

      ...if I tell you a story in French and you don't understand it, you will have no idea what I told you and will not be able to answer questions about my story. However, if you are able to memorize all I told you phonetically I can ask if I said a word or not just by the sound. Yet you don't know exactly what I asked for, nor the meaning of the answer but you are able to answer that question since it doesn't imply meaning.

      So a possibility for the OP would be to store the information in a language unknown to a

  • by Lord Ender (156273) on Thursday April 16, 2009 @04:37PM (#27602919) Homepage

    Keep the files on the remote server, encrypted. Keep the search index in a database, encrypted in chunks. Rsync your search database between your local machine and the server. Actual searches of the databases would be done locally.

    Result: terrible performance whenever you access your data from a new machine (must sync entire search database). Good performance the rest of the time. Remote server never sees anything but cyphertext.

  • There's plenty meaning that can be derived from just filenames.

    Does it really matter that Google or whoever can't see the exact text or images, but has enough information from filenames, tags and descriptions to accurately find out what kind of furry porn you like?

    People who encrypt their data often don't want to disclose even what kind of content they have. Knowledge of what sort of porn is there, or that you're having an affair, or private internal company data are things that can be disclosed from just k

  • So you either want to:
    - Decrypted
    - Search

    If so, then just mount an encrypted drive and put the Search Index on the drive its self... Basically any encryption filter driver will do the mounting for you (Windows and Linux ship with these) and any old Search Software will work for the searching, just move the index.

    Or you want to:
    - Search Encrypted Content
    - For other encrypted content (or decrypted content)

    In either case this isn't possible. At least assuming you're using a Crypto algorithm written in the last

  • by dschuetz (10924) <slash@david.dasne[ ]rg ['t.o' in gap]> on Thursday April 16, 2009 @04:41PM (#27603011) Homepage

    ...isn't this easy?

    Plaintext: "Attack at dawn"
    Ciphertext: "lkaoiuast98u;aw"
    Search query: "oiua"
    Result: "lkaoiuast98u;aw"

    What could be simpler?

    (no, I'm not an idiot, this is a joke.)

  • by burnin1965 (535071) on Thursday April 16, 2009 @04:45PM (#27603085) Homepage

    As long as your query looks something like this...

    SELECT * FROM mydata WHERE stuff LIKE '%YToyOntzOjc6InBhY2thZ2UiO3M6MjM5OiKyKHPh9ZawDX6KyA62cMd6p+mjBybGwJyCaNfFb7S.........

    Seriously though, if I understand your objective I think it would be feasible to develop something like that, but I don't think its something you could integrate into Google's search services unless they added something on their end.

    You could pass a decryption key along with your query and the server would then decrypt records as it performed the search. It would be very resource intensive.

    As an close example, I have a web based password storage application in which I did not want to keep the encryption keys on the same server as the password database. So I generate a key with which to encrypt the records and the user keeps their key and must supply it every time they want to decrypt a record. I don't go so far as to enable searching of the encrypted data, I have a description field specifically for that purpose. The web application is called Passbox [sourceforge.net] and is written in PHP.

  • What an oxymoron! (Score:3, Interesting)

    by hesaigo999ca (786966) on Thursday April 16, 2009 @04:47PM (#27603113) Homepage Journal

    Yeah, Id like my cake and eat it too!

    The only way this could work is if you has tags in the meta header of the encrypted file
    telling you that yes I am encrypted, but I have an image in me or my encrypted data is of the type accounting.

    This might work for indexing searches where you want to be able to return all the files on the pc (encrypted or not) that are images or etc...

  • Randomly say that you found or did not find the search pattern. Since you're not decrypting it, nobody can tell if you're lying.

  • I'm sure they copied and decrypted the data when you uploaded it.
    (This is why I wrap all my data in tin foil.)
  • by Naerbnic (123002) on Thursday April 16, 2009 @05:06PM (#27603475)

    There is a cryptography technique called Public Information Retrieval which allows you to do just that: Send an encrypted query to a server, let it perform some operations on your behalf, and send you an encrypted query result. The server neither knows the contents of the encrypted data, nor the content of the query, but you have your result nonetheless.

    The intuition is that there exists a sort of "black-box" operation which some cryptographic techniques can use. For example, if I have two encrypted bits a and b (where I can't tell what a and b actually are), I can still perform the operation a xor b. The result is encrypted, and I don't know the actual operands or the result, but I know that what came out is indeed the encryption of the xor of the encrypted bits. Such cryptosystems are forms of "Homomorphic Encryption".

    Using this, we can then give the server a search term thus encrypted and, using the black-box opertaion, have it do some set of operations which will reveal the result. The server will execute the exact same set of operations independent of the search term, so it knows nothing (and needs to know nothing) of the search term contents. Of course, this implies that the server has to operate on every element of the encrypted data to do its job, but that's the fundamental tradeoff. If you're willing to accept that, and the additional computational overhead, you can design such a system.

    • by Qubit (100461)

      Of course, this implies that the server has to operate on every element of the encrypted data to do its job...If you're willing to accept...the additional computational overhead, you can design such a system.

      Where's Bruce? He has the right combination of math and cs theory to spout off some usefulness on this thread :-)

      Anyhow, let's think about this plan:

      Let's say we store n chunks of encrypted data on the server d_1, d_2, ..., d_n. If they were PDFs, we could just store a corresponding text file for full text search on our local machine t_1, ..., t_n, which could be much smaller. If space were at a premium, we could even store each t_i (encrypted) on the server as well, but in order to get to the text, we'd ha

  • This is a great challenge and an active area of research for some time. Many researchers would like to build databases that protect the users without creating some huge pile of aggregated personal information.

    Encrypting the data at the client is a good solution. I've posted several good case studies from my book, Translucent Databases .

    Here's what I wrote for a library [wayner.org] and here's a case study of helping an online store. [wayner.org]

    Let me know if you have questions or suggestions.

  • As pointed out above, if the data is encrypted, the service can't search on it.

    So:
    - you get a VM or a hosted machine that you have complete control over.
    - You set up all your encryption as necessary, eg encrypting the file system. SSL to the machine, etc
    - You set up a search system, eg lucene, or maybe database as SQL queries are needed or whatever.
    - Profit(?)

    Of course, you could do all the same in-house as well, without the need for encryption etc.

    ws

  • by airjrdn (681898) on Thursday April 16, 2009 @05:30PM (#27603821) Homepage
    But it may not be everything you're looking for. My requirements were:
    1 - Mask the filename
    2 - Encrypt the contents
    3 - Add recovery data in case the file got damaged
    4 - Ability to view unmasked filename from web

    I put together a batch file I could drag/drop multiple files onto that used WinRAR to compress the files (individually), with encrypted filenames, a password (of course), and included archive recovery data. It then used ReNamer to encrypt the .rar filenames. After that, I simply FTP'd the files to the server.

    I had a webpage that would accept a password, and unencrypt the filenames so they were viewable in readable form on the page. Each one was a hyperlink. There was an extra step required if you wanted the downloaded filename to be unencrypted as well.

    After uploading 115G or so, my host alerted me to the fact that they didn't allow me to keep offsite backups there. :) So in the end, I'm not even using it at the moment.

    My solution didn't allow me to search within the files, but it did allow me to store files on the server that they had no way of viewing the contents of, or guessing the contents of based on filename.
  • There are some solutions for this. I think the first appraches were called "Iraiksan". However there is a massive performance penalty so you are unlikely to find this offered anywhere. Better keep metadata on your local machine and search that.

  • Either you send your storage provider clear data, in which case he can understand and work with it (including search through it), or you can send him (and ask him to store) encrypted data.

    One of the principal characteristics of (well-)encrypted data is that it is essentially random gibberish. Encrypting your search query won't somehow help him understand your encrypted data. The purpose of encrypting it is to keep (all) others out of it.

    Sorry.

  • The server just stores a bunch of indexes into your data and searches them when you supply the keywords. It sounds like what you really need is an efficient index (it requires few reads to determine whether what you are searching for is there, or that it isn't anywhere). Then you can build and encrypt the index and store it online in chunks, and download the pieces of it that you need to search for your keywords, and then retrieve the encrypted data that the index entries points to.

    For instance, if you wa

  • I use JLAN for this. I have a virtual private online server that i don't have root access to. So i can't install FUSE.
    Instead i installed JLAN which is a user mode java application that stores your data either in a file, a set of files or in a database. I store the data in a database (my provider gives unlimited database access with the virtual private server subscription).
    JLAN outputs the data as either an FTP, NFS or SMB share/filesytem. So it doesn't create a filesystem like FUSE does but it is still t
  • Encrypt all of your documents, word by word, with your private key. Each word would have to be in a separate file, named sequentially. If you need to search for a word, you sign your query, then search for that query. In short, this is a retarded idea, and even the best-case is garbage. And why wasn't this posted to Idle?

You can bring any calculator you like to the midterm, as long as it doesn't dim the lights when you turn it on. -- Hepler, Systems Design 182

Working...