Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Communications Data Storage The Internet

How Do You Store and Reconcile Email Archives? 380

heyitsjustme wants to know how you deal with old email. "I delete most of what I get but keep the stuff from friends and relations as an archive. Unfortunately I have these email archives from the late 80's through today in the form of macintosh, linux and windows mailboxes including AOL 1.0 mailboxes. What does everyone use to archive email across multiple platforms and non-standard mailbox formats? Is there an easy solution out there? Does anyone archive IM?"
This discussion has been archived. No new comments can be posted.

How Do You Store and Reconcile Email Archives?

Comments Filter:
  • Cyrus Imap... (Score:2, Interesting)

    by DaGoodBoy ( 8080 ) on Saturday March 12, 2005 @07:05PM (#11922545) Homepage
    ...with fetchmail / procmail / cyrdeliver for sorting and storing from other sources. How can 5GB of mail can't be wrong?! I can slice and dice my all my email (including about a gig of spam... [awtrey.com]) for choice bits of information.
  • Unix mail format (Score:3, Interesting)

    by saccade.com ( 771661 ) on Saturday March 12, 2005 @07:05PM (#11922547) Homepage Journal
    I use the basic Unix mail format, essentially plain text series of messages. Eudora does fine with it; and most anything else can read/import it. I have email going back to the 80's in this format. The one time I had to convert was when I was working for a company that used "Quickmail" on the Mac. I wound up reverse engineering their format and hacking up a program to convert it to plain text.
  • by boybaha ( 584738 ) on Saturday March 12, 2005 @07:05PM (#11922555)
    I also have email archives that stretch back to the early-1990s. I pretty much still have every email I've ever sent or received. When upgrading email clients, I often migrate my archives with me, converting them using whatever client's built-in importing and exporting functions I have available. I went from Eudora to Outlook Express to Thunderbird to Mac Mail. I also have programs that "pop" webmail off their sites (gmail, hotmail and yahoo) to consolidate them in whatever current mail client I'm using. I just keep them in neat folders ("Old Eudora Mail," "Old Yahoo Mail")..
  • Kinda Sorta OT (Score:5, Interesting)

    by hot_Karls_bad_cavern ( 759797 ) on Saturday March 12, 2005 @07:10PM (#11922592) Journal
    but ...

    Along these lines, is there an OSS package that can read the varied formats the Submitter is referring to, tag and drop them in a DB with a nice, friendly, web-enabled (secure) front-end for searching?

    My former employer kept *all* of his email from the last 20 years in tar.gz files. Let's just say it wasn't easy to find an email from er, 15 years ago very easily.

    Is there a package that can read the mbox, the other box-formats, plain text, pull from pop, old tar.gz bundles, categorize (sorta), tag and make such things searchable?

    Totally a shot in the dark here, i'm not a mail guy at all ... just wondering as the Submitter did what i like /. Submitters to do: make me think and look for new, better stuff ... or better ways to do old-stuff.

    It is the "drink" that makes me wonder, sorry :)
  • Re:One Word (Score:2, Interesting)

    by Murphy Murph ( 833008 ) <sealab.murphy@gmail.com> on Saturday March 12, 2005 @07:13PM (#11922619) Journal
    I second this.
    I started running my own IMAP server on an old machine a year or so ago - and synced all my old mail archives to various folders.

    My mailserver also solves another problem - multiple POP accounts. I have my IMAP server set up so that each one of my POP accounts gets automaticly tagged and sent to it's own folder.

    A third common problem this solves is having multiple machines. Now my desktop's email client is always synced with my laptop's email client. Before I had run into problems when ever I traveled and fetched my email from the road.
  • Archive what? (Score:3, Interesting)

    by Mishura ( 744815 ) on Saturday March 12, 2005 @07:15PM (#11922631)
    I never keep emails, or archive IMs or any other form of communication. Once a email is read, it is deleted. Same goes for normal old-skool mail, I read it and then trash it. The only exceptions are of letters/email of some importance such as information I need to keep handy, or if it has some kind of sentimental value (letters from deceased relatives for example.)

    Sure, HDD space is cheap; but I tend to equate people who archive every single form of written communication to those who have an Obsessive Compulsive Disorder, in that they hoarde everything in sight: newspapers, snail mail, magazines, boxes, etc..

    Commit to memory and destroy the evidence. Thats my way of handling archives.
  • Re:One Word (Score:5, Interesting)

    by pHDNgell ( 410691 ) on Saturday March 12, 2005 @07:24PM (#11922678)
    One word: IMAP

    Absolutely. I use no fewer than two mail clients on two different machines on any given business day. Every email I've sent since 1995 or something like that, and received since 1998 is available and searchable. Over this time, I've accessed this archive with the following clients:

    * pine (lots of pine)
    * mac mail
    * thunderbird
    * various netscapes/mozillas
    * ML (some random IMAP reader)
    * My phone (my old Sony/Ericcson speaks IMAP)
    * My palm (two different apps)
    * python
    * a java webmail system I wrote
    * three or four other webmail systems
    * mutt ...who knows what else. I've got freedom to try whatever I want at any given moment without losing my current or past mail.
  • by sstern ( 56589 ) on Saturday March 12, 2005 @07:25PM (#11922683) Homepage Journal
    I have several CDs worth of stuff archived with ForKeeps:

    http://www.fkeeps.com/whofor.htm

    It's a bit of an old program and the interface is clunky, but it works reasonably well once you work through it.
  • Re:Unix mail format (Score:3, Interesting)

    by Zocalo ( 252965 ) on Saturday March 12, 2005 @07:26PM (#11922690) Homepage
    Ditto, in my case the "mbox" format to be precise. I currently use Procmail to automatically CC all incoming messages to a dedicated archive file, one per month, each year in a seperate folder. Outgoing mail is also sent to the same file, although I could easily have an "infile" and an "outfile", break mail apart by topic, or whatever. For more robust long term backup purposes I simply tarball the dozen files within each directory into a file called "mail-yyyy.tar.gz" and backup as normal.

    Since mbox is a pretty standard format many tools have a built in import routine or that there there will be an existing third party tool to handle any conversions at least. Failing that, it's fairly trivial to cobble together a one-off conversion tool using a scripting language, or even to batch remail each message one at a time if your new email client uses some undocumented storage format, or is an online service like GMail.

  • Re:email archive (Score:1, Interesting)

    by Anonymous Coward on Saturday March 12, 2005 @07:33PM (#11922731)
    Of course its a fair comment. As you state many fortune 500 companies engage in record manipulation and Microsoft WAS caught, therefore it is a perfectly legitimate comment.

    Maybe other companies do it but until there is proof then you can't slander them but Microsoft do it, so they're fair game.

  • Re:One word (Score:0, Interesting)

    by Anonymous Coward on Saturday March 12, 2005 @07:36PM (#11922748)
    And have all my mail arrive in the NSA's inbox?

    Thanks, but I'll pass.

  • by Martian_Bob ( 695825 ) on Saturday March 12, 2005 @07:48PM (#11922801)
    I do data mining research, most recently on the Enron email dataset [cmu.edu], and I've actually been having to roll my own multi-mailbox storage, access, and retrieval systems. It's taking way more time than I'd like, at this point I've gotten a database and web-based viewers [uiowa.edu] made up (beware, they're quite slow).

    If anyone has an idea of an open-source application similar to what the submitter is looking for, it would help my research quite a bit. There's practical research applications in this stuff, if someone's interested in making it.
  • CSV (Score:2, Interesting)

    by vnangia ( 730425 ) on Saturday March 12, 2005 @07:49PM (#11922802)
    Just about every email program that I've used has managed to export to CSV. A few web-based email systems didn't allow such imports and some hunting on the web found some sort of convertor (like YahooPOPS!, etc.) that converted to POP and then I exported them to CSV using Eudora or Outlook, or whatever program I was particularly enamored with.
    Admittedly, sometimes the column names didn't match up ("Sender" v "From"), etc., but for the most part that how I did it. I also made an effort to keep the number of email accounts that I had to a minimum. At this point of time, most everything is stored in the form of .PST files that are archived on CDs and on an external hard drive.
    I also made an effort to keep my email accounts to a minimum, which probably made this entire process significantly easier and when I did close an account (like when I finished work at a company), I exported the emails from there and kept them in .PST in case I needed them for anything later on.
    As far as indexing works - I have them stored in 6 month segments (Jan97-Jun97, Jul97-Dec97, ...), since I can usually remember roughly when I got an email that I was looking for - alternatives include perhaps by name of sender or company.
    I do archive IMs - Trillian [trillian.cc] worries about it for me. :)
    Hope this helps.
  • by rahard ( 624274 ) on Saturday March 12, 2005 @07:59PM (#11922854) Homepage Journal
    I archive most of my emails. Up to this point, my email archive is close to 2 GBytes.

    I keep the emails in mailbox format (that is, in plain text as it is stored in most UNIX systems), in several files. The reason I do that is that most email readers (MUA) can read mailbox format. I keep them in several files to make it more manageable.

    The tools that I use to manipulate emails are mostly "from", "procmail", "grep", and "less". There used to be tools from the "elm" era (still remember them?), such as "frm" (which is better than "from"), "reademail" (to read individual email, given the number of email in the archive), "deletemail" (which can delete an individual email in the archive). Too bad, these tools are gone. At one point I slapped a simple Tk interface as a front end to those tools. But it didn't scale well.

    At one point I did experiment to store emails in indiviual files. But the tools to manipulate them are limitted. I used MH.

    The next experiment I did was to take all those email headers and put them in a database. (I used msql, which was popular at that time.) Then, I had a Java applet and perl script to make queries to the database (and actually did an analysis of my reading habit). The actual emails were stored as plain text files. Each email was stored in individual file. Basically, the original email was untouched. I got bored and never continue the project.

    Now ... I am stll searching for the perfect email tools.

  • OE to mbox to html (Score:2, Interesting)

    by Gax ( 196168 ) on Saturday March 12, 2005 @08:41PM (#11923044)
    My father was concerned about the longevity of his e-mail a few years ago, so I created a small batch file that converts his Outlook Express mail archive into mbox on a monthly basis. Last month he asked if I could convert them "into a web site" so he can get an idea of a thread history without parsing a huge file. When I get a moment I'm planning to write a script that outputs each message to a new file in html tags and use the message subject and date to create a rudimentary index.html.

    I'm surprised no one has tried this before. It's a good low-tech solution for people who require information in a hurry and is more immediate than a flat file.
  • by glesga_kiss ( 596639 ) on Saturday March 12, 2005 @08:49PM (#11923089)
    This may involve installing Outlook, exporting all of your mail to Outlook, and importing it all from outlook, but it is worth it.

    Outlook + IMAP is the way I do it. You can drag messages between local storage and your mail server.

  • Ink (Score:2, Interesting)

    by PCheese ( 810782 ) on Saturday March 12, 2005 @10:35PM (#11923676) Homepage
    You have to print it with something. Ink: one of the most expensive ways to put stuff on paper. Heck, they say it costs seven times more than champagne [bbc.co.uk] per drop! That, plus the costs of cartridges and printer maintenance and, and... oh the horror! ;)

    Me? I obsessively reinstall my operating system and reimport old mailboxes into my mail client, so I have a dozen copies of 5-year old email, ten copies of 4-year old email, 8 copies of 3-year old email, etc. No need for backups... plus when I search my computer for old email, I get a dozen copies of what I'm looking for!
  • ex post facto Law (Score:3, Interesting)

    by Morosoph ( 693565 ) on Sunday March 13, 2005 @06:48AM (#11925140) Homepage Journal
    This is why the House of Lords was resistent to the prosecution of Nazi war criminals for so long, incidentally.
  • Re:Dave's top ten (Score:1, Interesting)

    by Anonymous Coward on Sunday March 13, 2005 @11:29AM (#11926035)
    I consolidated all my personal e-mail since 1995 into a Maildir (which I access using IMAP).. It totalled only 60 MB. I don't think that is a whole lot that I need to worry about disk space or searching or my IMAP server not able to handle it. The way I have it organized, my searches don't occur on any of the old mail (unless I want it to). The only point I think you were right about is the evidence used against me (in my case, anyhow). It's kinda entertaining to go back and read some of my old correspondences and see how much of a different person I was back then. It's kinda like looking at old diaries or something.
  • by PCMeister ( 837482 ) on Sunday March 13, 2005 @06:39PM (#11928395)
    With the advent and subsequent improvements of LiveCD distros, it should be relatively painless for the average /.'er to:

    * Create a multi-session CD/DVD with your favorite Linux LiveCD distro
    (or roll your own [linux-live.org] and create an ISO for future use)

    and

    * Backup email files to said CD/DVD
    (I suggest a set of re-writable media of good quality to play it safe.)

    Further suggestions:
    1. It would be advisable to split your archives (ie. Mail2004, etc.), especially if you plan to retain a sizeable amount of mail.
    2. Convert archives from older mail clients before creating backup, or use a newer mail client that can read the old files with ease.

    Good luck!

One possible reason that things aren't going according to plan is that there never was a plan in the first place.

Working...