Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Best Way To Archive Emails For Later Searching? 385

An anonymous reader writes "I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?"
This discussion has been archived. No new comments can be posted.

Best Way To Archive Emails For Later Searching?

Comments Filter:
  • one word (Score:1, Interesting)

    by Anonymous Coward on Monday September 06, 2010 @11:51AM (#33488944)

    gmail

  • OK, My Favorite (Score:3, Interesting)

    by BoRegardless ( 721219 ) on Monday September 06, 2010 @11:53AM (#33488962)

    MailSteward on the Mac.

    SQL database. Good, Inexpensive, works w/many tens of thousands of emails & more.

    http://mailsteward.com/ [mailsteward.com]

  • Maildir (Score:5, Interesting)

    by roderickm ( 6912 ) on Monday September 06, 2010 @11:59AM (#33489028)

    Maildir storage format is resistant to bit-rot because it stores each message in a separate file, and uses filesystem directories for mail folders. It's widely supported by user agents (mail readers) and IMAP/POP3/SMTP servers, so you'll never be stranded by the actions of a single software vendor. Finally, it's easily searched using everyday unix tools - find, grep, sed, awk, etc., and you can use the full-text search engine of your choice for speedy searches.

  • by Cylix ( 55374 ) * on Monday September 06, 2010 @11:59AM (#33489032) Homepage Journal

    I never thought of turning an ancient host into an alarm clock.

    Once however, I did hollow out an SGI case and turn it into a refrigerator.

    The case was just too damned pretty to throw away.

  • by perpenso ( 1613749 ) on Monday September 06, 2010 @12:03PM (#33489056)
    And now the poster becomes an advertiser's dream come true in addition to being a hostile lawyer's dream come true. ;-)

    Remember that from Google's perspective gmail is a tool to better profile you for targeted advertising. Make sure you are OK with that before giving them access to all your emails.
  • by Nemilar ( 173603 ) on Monday September 06, 2010 @12:09PM (#33489116) Homepage

    OK, so I hear this a lot and I never really understand the problem.

    The "unwritten gmail contract" (and it actually applies to most Google products) is this: We will give you a service for free (in this case Gmail), and in return we are going to profile your use of that service to select ads for you. In the case of gmail, they give you however many GB of storage, always-on cloud email, and the best searchable email system I've ever seen. There are other Google examples, from gtalk to Google Docs. The basic principle behind it is the same, most people understand the deal, and I don't see anything wrong with it. There's no such thing as a free lunch, but this is pretty close.

  • by garcia ( 6573 ) on Monday September 06, 2010 @12:11PM (#33489142)

    Starting with GMail I have kept every e-mail since 6/22/2004. I also brought over many e-mails I had in my saved folders from long before that. Am I insane? No. I have found this archive incredibly useful for any variety of uses even 6 years later.

    Nothing like having your wife ask, "man, I wish we still had the recipe for deviled eggs we made in college. Too bad it was back in 2001." "No problem honey, hold."

    Date: Fri, 26 Jan 2001 13:40:46 -0500
    From: yoyoskippy
    To: garcia@tigerose.com (now dead, have at it spammers)
    Subject: Deviled eggs

    Deviled Eggs

    6 hard cooked eggs
        (throw two more eggs in, so you can check how they are doing)

    pinch of salt (thats a pinch boy, wayyyyy less than 1/4 tsp.)

    1/4 tsp. pepper
    1/2 tsp. dry mustard
    2 Tbsp. Hellmans
    1 Tbsp. Miracle Whip
    Paprika (sprinkles)

    Boil the eggs, use the extra two eggs to check the eggs process. when boiled crack the shell a bit with a spoon. then put the eggs in cold water w/ice cubes. this makes it easier to peel the shell off the egg. Next take the yolks out of the eggs and smash up very finely with fork. next add all of the ingredients together to make the topping. mix well. spoon the mixture onto the egg and then sprinkle on paprika. enjoy. yum yum!!

    Pulled that out a couple weeks ago for a picnic. Yum yum!! was right.

  • Just because I can? (Score:1, Interesting)

    by mrv00t ( 858087 ) on Monday September 06, 2010 @12:18PM (#33489196)

    would now like to merge all the emails back into a single searchable archive — just because I can. But there are a few problems:

    ...so you can't?

  • Re:IMAP (Score:1, Interesting)

    by Anonymous Coward on Monday September 06, 2010 @01:05PM (#33489558)
    I have 3GB of email and I use Eudora, searching for emails isn't that slow if you can organize stuff into smaller folders.

    So I'm sure a more geeky solution can be much faster: e.g. a postgresql database with full text search and metadata search.

    The stuff you'd want to search is mainly in text so it shouldn't be too difficult.

    If you wanted an equivalent search for videos, sound or pictures that'll be harder - e.g. given a picture of this object, please find videos containing this object.
  • by pz ( 113803 ) on Monday September 06, 2010 @02:31PM (#33490374) Journal

    When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

    This sort of behavior is odd and not normal. If you want to keep your email, then that's fine, but thinking that it's "vitally important" is odd and I think without question points to some "OCD with some component of Aspberger". If you don't then maybe you need to re-evaluate.

    I am however interested in how you pull demographic analysis out of emails? I mean, hopefully you're not suggesting that you go and chomp on the text to pull out fields of data?

    So on the one hand, you think my saving email for later access and analysis is not useful, but then, you want to know why it is useful?

    I run a research laboratory where we do two things, one is work on restoring sight to the blind, the other is to organize a conference every two years. The primary demographic analysis I need to do is to analyze the country-of-origin for email traffic pertinent to the conference. This has helped to raise many tens of thousands of dollars of support for the conference by demonstrating various aspects of the global attendance to funding agencies.

    Being able to access my email and locate attachments, review discussions, find references, remember addresses, etc., in other words, to recall what someone once wrote to me, has resulted in millions of dollars of grant money to fund my research. Without the ability to review email that is, at times, years old, that would not be possible. Having rich access to my email stream has allowed me to fund my lab, and therefore feed and house my family and the people who work for me, publish high-impact papers, receive numerous awards, get coverage in the international press, etc., or, put better, to run the daily business of a research lab at a high-profile university. While the tools I use are good, they leave a lot to be desired, and having a better system would make me more productive.

    IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea,

    I think that GMail could be the panacea here. I mean, if you're just trying to make sure it lasts and you can search it with ease, then GMail can do it better than you can.

    I dislike GMail for my professional correspondence for a number of reasons: (1) it does not allow me to readily use my university affiliation address (and since that's a top university, that makes a difference whether people like it or not), (2) I do not have ownership of my email, (3) the lack of a good filing / archiving interface makes it hard to associate different threads together, or to limit searches (I intensely dislike the tagging feature), (4) GMail has an only rudimentary ability to edit text since it's browser-based.

    I do use GMail for my personal correspondence, but that's mostly because it's the best of a bunch of poor, but free, services. It does have the best searching features, but falls down in a lot of other ways. It also would be against my employer's policies to store HIPAA-regulated email offsite. So GMail is not a panacea. Thanks for the suggestion, though.

"I've seen it. It's rubbish." -- Marvin the Paranoid Android

Working...