Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Communications Data Storage

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive? 167

Posted by timothy
from the would-settle-for-placeholder-images dept.
First time accepted submitter txoof writes "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010. The archive is for storage only. It is a mirror of my Gmail account. The archives are both sent and received mail compressed in a hierarchy of weekly, monthly and yearly mbox files. I've chosen mbox for a variety of reasons, but mostly because it is the simplest to implement with fetchmail. After inspecting some of the archives, I've noticed that the larger files are a result of attachments sent by well-meaning family members. Things like baby pictures, wedding pictures, etc. What I would like to do is from this point forward is strip out all of the attachments and only save the texts of the emails. What would be a sane way to do that using simple tools like fetchmail?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive?

Comments Filter:
  • Why bother? (Score:5, Insightful)

    by grumbel (592662) <> on Sunday December 04, 2011 @03:39PM (#38259058) Homepage

    Storage is cheap and 500MB are hardly worth worrying about. The damage done by reducing that amount will likely be far larger then any temporal benefits you might get. If you want to have it smaller so that you can have faster search, look for a tool that is better at searching and indexing the mails instead of trying to cut the mail into pieces.

  • Re:Why bother? (Score:1, Insightful)

    by InsightIn140Bytes (2522112) on Sunday December 04, 2011 @03:41PM (#38259070)
    Exactly this, and even if it's a few GB. It's just too small amount to bother about. Besides, you never know which one you may want or need later. Even the ones you snobbishly think as uninteresting now.
  • Why keep it? (Score:2, Insightful)

    by Anonymous Coward on Sunday December 04, 2011 @03:53PM (#38259182)

    If you're not following Sarbanesâ"Oxley, just delete it. Fuck the pack-rat mentality.

  • Re:500 Mb only? (Score:5, Insightful)

    by optimism (2183618) on Sunday December 04, 2011 @04:02PM (#38259278)

    Many people have a larger email store than you.

    It is not a sign of status.

    More likely, it is a sign of your incompetence to filter and save relevant data.


    Now back to the OP, who perhaps is smarter than you, since he has has just 500MB of email to back up.

  • by MacTO (1161105) on Sunday December 04, 2011 @04:18PM (#38259410)

    There is probably some email that you need to keep, but chances are that you don't need to keep most of your email. So just read, respond, then purge (when appropriate).

    As others have pointed out, disk space isn't really a concern this day in age. But managing data that you don't need is a concern. A minute spent filing, backing up, etc. of unnecessary data is a minute wasted. Add enough of those seconds together, and it may amount to a good chunk of your life spent doing more interesting/productive things.

    As a side note, I notice that people sometimes get attached to things that don't really matter to them. I've known people who have lost all of their data due to circumstances beyond their control, then they became very distressed about that loss of data. The problem is that only a tiny fraction of that data was actually valuable, but they were worrying about all of the data. In some cases it was so traumatic to them that they spent more time worrying about the irrelevant stuff than the stuff that they would need to continue on in the future. So if you don't keep the irrelevant stuff, you can focus on what is relevant.

  • by Just Brew It! (636086) on Sunday December 04, 2011 @04:26PM (#38259474)
    Even at today's post-Thailand-flood inflated hard drive prices, your entire e-mail history occupies less than a dollar's worth of disk space. I fail to see the issue.
  • Re:Why bother? (Score:5, Insightful)

    by grcumb (781340) on Sunday December 04, 2011 @05:18PM (#38259836) Homepage Journal

    I have often longed for a "generic email database format" which could be a universal format for all email programs out there in some way. Pretty much a dream which is long over-due... about 10 years past-due. Perhaps there is already something like that and it has escaped me all these years but I seriously hate migrating email from one format another.

    Take a look at Maildir []. It's not perfect, but it is generic, simple and easily transferred from one location to another.

    RANT: Over the course of my (far too many) years of working in technology, I've often been amazed just how enamoured everyone is with databases. There are some things that databases do well, granted, but just because something needs an index doesn't mean it needs a relational database. /RANT.

  • Re:Why bother? (Score:5, Insightful)

    by grumbel (592662) <> on Sunday December 04, 2011 @05:19PM (#38259844) Homepage

    My advice is to keep your archives, but take the time to filter out the stuff you really don't need or want any more.

    The problem with that is that it's extremely hard to judge what you will find valuable 20 years down the road.

    Simple example: Old TV recordings on VHS. I have all of Star Trek: TNG on VHS, labeled, sorted, with the commercials cut out. All nice and dandy you might think.

    You know which part I would love to rewatch? Now, some 15 years later? The commercials, exactly that part which I deleted. All the episodes I can get easily on DVD or on BluRay without problems, with higher quality and everything, but the stuff between the episodes? Nope, that's not available. Here and there a bit of stuff shows up on Youtube, but raw uncut TV from 15 years ago simply isn't easily available.

    There will also be obsolete software, video and flash attachments that were funny five years ago, and other junk.

    Yeah, and exactly that stuff might turn out to be extremely valuable years down the line, as your copy of it might be the only copy left or at least the only copy accessible to you.

    I have absolutely nothing against sorting, indexing and organizing the data, I quite welcome that, but that should be done as a layer on top of the data, not by hacking and slashing the original data itself.

  • by vadim_t (324782) on Sunday December 04, 2011 @05:27PM (#38259900) Homepage

    It creeps me how young geeks hand out all their personal data to the first free provider they happen to come across.

    Yeah, it's a bit of a pain sometimes, but the benefit of having the data where I want it, dealt with how I want it, outweighs the cost IMO. It also makes for good system administration practice if you have an interest in that kind of thing.

  • Re:500 Mb only? (Score:5, Insightful)

    by icebraining (1313345) on Sunday December 04, 2011 @05:40PM (#38259968) Homepage

    Who still uses e-mail?

    People who get stuff done instead of being interrupted every 5m? And who want to receive messages even while offline? And have decent systems for archiving, tagging and searching them?

  • Meta-Facepalm! (Score:3, Insightful)

    by Anonymous Coward on Sunday December 04, 2011 @07:30PM (#38260972)

    FTS: "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010."

    So, the total space required thus far is definitely less than (8 * 0.5 GB) = 4 GB. A USB flash drive with that small a capacity is practically classified as electronic waste these days.

    Even if his or her annual e-mail archive size doubled every year for the next 10 years, it would only be 1+2+4+8+16+32+64+128+256+512=1023 GB.

    A 3 TB hard drive he buys *today* for $100 would probably solve his "problem" for 10 more years.

    Hopefully, in the year 2021, we will have tiny 3 PB SSD drives for $100... But maybe we will be ruled by an A.I. by that time, if we haven't already destroyed ourselves with viruses, nanomachines, robots, nuclear weapons, etc.

"What is wanted is not the will to believe, but the will to find out, which is the exact opposite." -- Bertrand Russell, _Sceptical_Essays_, 1928