Forgot your password?
typodupeerror
Communications Data Storage

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive? 167

Posted by timothy
from the would-settle-for-placeholder-images dept.
First time accepted submitter txoof writes "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010. The archive is for storage only. It is a mirror of my Gmail account. The archives are both sent and received mail compressed in a hierarchy of weekly, monthly and yearly mbox files. I've chosen mbox for a variety of reasons, but mostly because it is the simplest to implement with fetchmail. After inspecting some of the archives, I've noticed that the larger files are a result of attachments sent by well-meaning family members. Things like baby pictures, wedding pictures, etc. What I would like to do is from this point forward is strip out all of the attachments and only save the texts of the emails. What would be a sane way to do that using simple tools like fetchmail?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive?

Comments Filter:
  • Why bother? (Score:5, Insightful)

    by grumbel (592662) <grumbel@gmx.de> on Sunday December 04, 2011 @03:39PM (#38259058) Homepage

    Storage is cheap and 500MB are hardly worth worrying about. The damage done by reducing that amount will likely be far larger then any temporal benefits you might get. If you want to have it smaller so that you can have faster search, look for a tool that is better at searching and indexing the mails instead of trying to cut the mail into pieces.

    • Re: (Score:1, Insightful)

      Exactly this, and even if it's a few GB. It's just too small amount to bother about. Besides, you never know which one you may want or need later. Even the ones you snobbishly think as uninteresting now.
      • Re:Why bother? (Score:4, Interesting)

        by AliasMarlowe (1042386) on Sunday December 04, 2011 @04:01PM (#38259260) Journal

        Exactly this, and even if it's a few GB. It's just too small amount to bother about.

        Agreed. 500MB is trivial, especially if it includes a bunch of large attachments. I just checked my email directory at home, and it's 2.7GB in size. It's on a network drive and Thunderbird accesses it more-or-less instantly; there is no discernible lag in showing the content of any mail folder - the hierarchy of folders is complicated, but some folders are large. The network drive is backed-up automatically three times a week, so its risk of loss is tolerably low. With modern email clients, the penalty of huge email directories should be tiny.

        • nearly 500 megs from 2010.

          OP did not specify how much space is being used total, but everyone is taking the 500MB as the main sticking point. *facepalm*

          The point being it will get larger in the future, even if OP never runs the risk of exceeding Gmail's quota.

          Far as I can tell this is a TMI question about fetchmail and attachments. Wish I could help.

          • Meta-Facepalm! (Score:3, Insightful)

            by Anonymous Coward

            FTS: "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010."

            So, the total space required thus far is definitely less than (8 * 0.5 GB) = 4 GB. A USB flash drive with that small a capacity is practically classified as electronic waste these days.

            Even if his or her annual e-mail archive size doubled every year for the next 10 years, it would only be 1+2+4

            • And on top of that, I suspect email usage will start decreasing in the not too distant future, or that it already has. Most people I know use social networks, IM, Dropbox, etc, for file transfers these days.
            • By the way, if you can point me at a $100 3TB hard drive, I'm in the market for one.
      • by J4 (449)

        "snobbishly"

        WTF? It's his own personal email.

        • by pla (258480)
          WTF? It's his own personal email.

          Poor choice of words, perhaps, but I completely understand the sentiment. I've had some form of email since around 1991, and despite my OCD-like "completionist" tendencies, I never thought to archive it all until sometime around 2003.

          Now, considering the tiny actual disk space those early emails would have taken, I sorely regret my earlier habit of read-respond-delete.

          These days, I delete spam (and some large attachments), and nothing else. And some day, I'll probab
          • I'm not keen on deleting, but archiving and putting elsewhere is a good idea, otherwise you end up with thousands of hits when you do a search for something. If I know I want something from 5 years ago I can always open the archives.
    • Seconded (Score:2, Redundant)

      by Colin Smith (2679)

      Nothing to see here. Move along.
       

    • I have my email going back to 1996. Several copies (2.4GB) of it in fact, as the email has moved with me through a number of disk and system upgrades. If you really must free up the space, why not just write the mail to a CD/DVD or a USB stick or put it on the SD card of your smart phone?

    • by txoof (553270)

      Storage is cheap, but backing it up to S3 is less cheap. I looked through a bunch of the mail and discovered that what I really wanted to save was the text. The rest is backed up on Google. If I lost it all, it wouldn't be a tragedy, but the mail between my wife and I before we were married and messages between my family are the things I treasure most, not the photos that I can find on facebook/flickr/gmail/picasa/etc.

      Finding a way to save some space and some bucks is worth while for me. After a lot of

      • by AmiMoJo (196126)

        Google will sell you 20GB extra space for $5/year. You can now upload arbitrary file types to Docs too, including encrypted archives.. The only cheaper alternative is Skydrive which gives you 25GB free, but bulk uploading does require Silverlight unfortunately.

        Considering the amount of time and effort I'd have to put in to sort through and reduce my archives $5 a year is an absolute bargain. You get Gmail's search and anti-spam systems too which are pretty valuable in themselves.

    • by erroneus (253617)

      True-true.

      But on a related note, I have often longed for a "generic email database format" which could be a universal format for all email programs out there in some way. Pretty much a dream which is long over-due... about 10 years past-due. Perhaps there is already something like that and it has escaped me all these years but I seriously hate migrating email from one format another. Not long ago, I was helping someone to recover some old email (Outlook Express) and contacts which were in Japanese and no

      • Re:Why bother? (Score:5, Insightful)

        by grcumb (781340) on Sunday December 04, 2011 @05:18PM (#38259836) Homepage Journal

        I have often longed for a "generic email database format" which could be a universal format for all email programs out there in some way. Pretty much a dream which is long over-due... about 10 years past-due. Perhaps there is already something like that and it has escaped me all these years but I seriously hate migrating email from one format another.

        Take a look at Maildir [wikipedia.org]. It's not perfect, but it is generic, simple and easily transferred from one location to another.

        RANT: Over the course of my (far too many) years of working in technology, I've often been amazed just how enamoured everyone is with databases. There are some things that databases do well, granted, but just because something needs an index doesn't mean it needs a relational database. /RANT.

        • by batkiwi (137781)

          Maildir is exactly what you say, a generic email database.

          It's not a relational database, as email isn't really relational in nature, but solves most/all of the problems you need to solve around storing emails. The only big "miss" in maildir is that attachments are stored inside the main message, making pass-through deduplication difficult/impossible.

          (many storage devices now can auto deduplicate files that are identical, so if you get the same image in 15 different emails due to reply-to-all etc you only

          • by wvmarle (1070040)

            Cyrus imap server has indexing services built in. Works well.

            I can search my complete e-mail archive (something like 12 GB over 8 years, including attachments) in seconds, while I'm sitting at home with barely any mails copied locally (only mails that I actually opened are pulled in, for the rest only the headers are downloaded).

            Mail client is Evolution; server is Cyrus imapd. I do assume other IMAP servers will have similar functionality, and other IMAP mail clients will also handle server-side searches

        • by sootman (158191)

          Funny. Maybe I've been working with databases too long that it's affected my mind (or maybe I got *into* databases because that was *already* how my mind worked) but I've *always* wanted to be able to say things like "show me all messages from my mom, dad, or sister, that arrived in 2005 and had attachments, and sort with the biggest at the top."

          On a related note, the fact that Gmail doesn't let you click column headings to sort absolutely kills me.

          • by grcumb (781340)

            Funny. Maybe I've been working with databases too long that it's affected my mind (or maybe I got *into* databases because that was *already* how my mind worked) but I've *always* wanted to be able to say things like "show me all messages from my mom, dad, or sister, that arrived in 2005 and had attachments, and sort with the biggest at the top."

            Oh, don't get me wrong, I love that kind of stuff too. I once worked on a product that allowed you to construct queries along the lines of 'Show me every speech by every West Coast politician who spoke about the salmon fishery between July and September, 2009, translated into French.' But here's the thing: It didn't use a relational database.

            I love finding clever ways to mangle data. It's my bread and butter. But I do NOT love relational databases enough to use them for everything, all the time.

        • by Malc (1751)

          Thousands of tiny files? That sounds really efficient, especially if you want to copy them, or perform some other global operation. ZZzzzz.

      • by gaspyy (514539)

        Been there, done that.
        My email archive dates back to 1995. Over the years I've been using Pine, Eudora, Outlook Express, Netscape Communicator, Outlook, Thunderbird, Windows Live Mail.

        I converted everything to EML. It's a simple format, easy to read and parse, recognized by the OS. With a simple script I renamed each file to YYYYMMDD-From-Subject.eml, so now it's accessible any way I like, gleaming at the file name, or by searching the contents (Windows 7 indexes EML files).

        Writing a script to strip attachm

    • by Spudley (171066)

      Storage may be cheap, but that's hardly an excuse for being cluttered.

      Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again. And the more you have, the less likely it is that they ever will be read, because the more you have, the more time it will take to go through them.

      And don't tell me that doesn't matter because it's easy to run a search -- the same still applies, and you'd only bother running a search if you had something specific you wante

      • Re:Why bother? (Score:5, Insightful)

        by grumbel (592662) <grumbel@gmx.de> on Sunday December 04, 2011 @05:19PM (#38259844) Homepage

        My advice is to keep your archives, but take the time to filter out the stuff you really don't need or want any more.

        The problem with that is that it's extremely hard to judge what you will find valuable 20 years down the road.

        Simple example: Old TV recordings on VHS. I have all of Star Trek: TNG on VHS, labeled, sorted, with the commercials cut out. All nice and dandy you might think.

        You know which part I would love to rewatch? Now, some 15 years later? The commercials, exactly that part which I deleted. All the episodes I can get easily on DVD or on BluRay without problems, with higher quality and everything, but the stuff between the episodes? Nope, that's not available. Here and there a bit of stuff shows up on Youtube, but raw uncut TV from 15 years ago simply isn't easily available.

        There will also be obsolete software, video and flash attachments that were funny five years ago, and other junk.

        Yeah, and exactly that stuff might turn out to be extremely valuable years down the line, as your copy of it might be the only copy left or at least the only copy accessible to you.

        I have absolutely nothing against sorting, indexing and organizing the data, I quite welcome that, but that should be done as a layer on top of the data, not by hacking and slashing the original data itself.

      • It's called "search". It would take tens of hours to manually sift thru all of my email and clean it up. And then I'd still need to use the search function to find stuff quickly. So what would I gain from this hypothetical cleanup scenario? I'd save maybe 2.5 gigs of storage. Be still my heart. It's a very poor economic tradeoff. My time (even unpaid time) is worth a heck of a lot more than that.

        I don't understand this "cluttered" concept that seems to distress you. They're bits on a hard drive, not

      • Re:Why bother? (Score:5, Interesting)

        by icebike (68054) on Sunday December 04, 2011 @06:14PM (#38260258)

        Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again.

        As soon as:
        1) you divorce
        2) you get arrested for ANYTHING
        3) They arrive with a search warrant for any reason
        4) You sue or are sued
        5) You run for office
        6) You get hacked

        Seriously, I keep VERY little historical Email. Very little.
        I am not so vain that I believe there is any historical significance, and have never needed to go back more than a couple months for anything.

        Just Delete it. Its safer that way.

        • Agree. I read email, deal with it, then delete it. 15+ years of using email and I've never found any reason to keep email ever. My current work email inbox is less than 100MB, most of which was generated in the last couple of weeks. My personal email has nothing in it. Why bother?
        • Ask yourself: When are you ever going to read all those email again? When is *anybody* ever going to read them again.

          1) I need to order ink recently. Now, I don't print much but I vaguely remembered a good supplier that I had used in the past. But what was it called? A few moments of greping and I found it: in a confirmation email from three years ago.

          2) I met a woman on a Meetup hike recently that I seem to have met before. Was this the blind date from four years ago? The smoking gun was in an email from 2007.

          3) I've had occasional need to look up old acquaintances. While I might have created a contact file at variou

        • by dbIII (701233)
          7) New girlfriend
          "Why did you never send me emails like those ones you sent her?" she asked.
          "You didn't even have an email address before you moved in" just wasn't a good enough excuse.
      • My guess is that if you followed that advice, your email archives are now about a quarter of their original size. And nothing of value was lost.

        Well, except the time that you spent sorting through all your old e-mails. I'm sure that I could erase 99% of the old e-mails in my archive ... but that would require actually going through them so that I could save the ones that I may need in the future. (And yes, every once in a while I have a reason to go find something from ten years ago or more.)

        Remember, "cl

      • by wvmarle (1070040)

        I have some 12 GB of mail, mainly business related, lots of attachments, dating some 8 years back.

        Quite regularly (once a month or so) I am looking for some e-mail that I received well half year to a year ago, to look up some detail about an old deal or offer.

        And sometimes I have to look up something that's a bit older than that. Two, three times so far I have been searching through e-mails that dated five, six years back, pretty much the beginning of the archive then. And that usually also had to do with

    • Re:Why bother? (Score:4, Interesting)

      by houghi (78078) on Sunday December 04, 2011 @04:54PM (#38259658)

      Why bother indeed. When I look at my mailfolders, I try to think on my personal mail when the last time was that I actually searched for something older then one year,

      Mails that I keep are orders I placed and passwords that I requested. All the rest I delete after one year.

      I already do a lot of deleting after reading already. e.g. most mailing lists will be deleted almost immediately. Things I keep are bug reports I filed, till they are closed.

      This is something I do in real life as well. If I have not used something in a year and there is no emotional value, I will trow it away. Even though it is technically possible to keep everything, I see no reason to do so.

  • by Anonymous Coward

    My email archive dates back to 1999 and is 2GByte in size which isn't much considering the attachments.

    I "handle" it by making a backup of it.

    I do not clean it up. I do clean around it by deleting mail archives that contain mails that have no personal value.

    I do not delete personal mails since it is precious like photos.. In 2011 nobody has to delete his personal mail..

    This news is stupid

  • Surely a tool exists to keep email in a SQL database, so the envelope fields, plain text, and attachments are separately searchable. I have email back to 1996 with the same frustrations.

    One would think that Thunderbird would have done that a decade or more ago, but no. Nor does any of the standard IMAP servers seem to support SQL (MySQL, Postgres) as a backend: This seems like a serious project waiting to happen. Or have I overlooked an obvious solution?

    • by BitHive (578094) on Sunday December 04, 2011 @03:45PM (#38259110) Homepage

      You have. Thunderbird includes archival folders and a Lucene search engine.

      • Sup [rubyforge.org] uses Xapian, it's pretty fast too.

    • by zmughal (1343549) on Sunday December 04, 2011 @03:49PM (#38259150) Homepage
      There is DBMail [dbmail.org].
    • by cras (91254)

      Email isn't stored in SQL, because typically it's rather pointless. Full text search indexing doesn't require SQL, and it's more efficient without SQL anyway. There are some good use cases for storing emails in SQL database, but efficiency isn't one of them.

      • Actually the point of storing email in SQL isn't just for indexing, there's a huge speed advantage. DBMail (which I've administered and installed) is used for high volume mail transactions, on the order of 200,000k per sec. Also, having a DB backend carries with it all the advantages of having a DB, snapshots, mirroring, cross-regional updates, backups, etc. I agree, you can definitely get by without it but having email in a database is nice.

        • by cras (91254)

          Yes, there are some advantages to using SQL database, like I said.. But I highly doubt "huge speed advantage" is one of them, unless you compare to a really badly set up system. I know people have switched from DBMail to Dovecot simply because Dovecot is so much faster..

          • Yes, there are some advantages to using SQL database, like I said.. But I highly doubt "huge speed advantage" is one of them, unless you compare to a really badly set up system.

            Yeah, like Thunderbird still using Mork databases after a decade of sharp-poke-in-the-eye performance.

  • Why keep it? (Score:2, Insightful)

    by Anonymous Coward

    If you're not following Sarbanesâ"Oxley, just delete it. Fuck the pack-rat mentality.

    • by MrMickS (568778)

      Its personal email. I've reasonably often searched back over a number of years for something I vaguely remembered, finding the associated emails gave me the information I was looking for. I have personal email going back to 1996 all sitting behind an IMAP server. I did look at clearing it down at one time but, in the end, that was more effort than simply leaving it there.

      I guess I'm not part of the disposal culture that we have these days. I place value on history, even my own.

  • by spinkham (56603) on Sunday December 04, 2011 @04:15PM (#38259378)

    IMAP is another potential answer.

    I run Dovecot locally, and it stores every mail I've ever received, indexed for quick searches.

    This way I can get my mail with all history and a fast search index on all my devices also.

  • by MacTO (1161105) on Sunday December 04, 2011 @04:18PM (#38259410)

    There is probably some email that you need to keep, but chances are that you don't need to keep most of your email. So just read, respond, then purge (when appropriate).

    As others have pointed out, disk space isn't really a concern this day in age. But managing data that you don't need is a concern. A minute spent filing, backing up, etc. of unnecessary data is a minute wasted. Add enough of those seconds together, and it may amount to a good chunk of your life spent doing more interesting/productive things.

    As a side note, I notice that people sometimes get attached to things that don't really matter to them. I've known people who have lost all of their data due to circumstances beyond their control, then they became very distressed about that loss of data. The problem is that only a tiny fraction of that data was actually valuable, but they were worrying about all of the data. In some cases it was so traumatic to them that they spent more time worrying about the irrelevant stuff than the stuff that they would need to continue on in the future. So if you don't keep the irrelevant stuff, you can focus on what is relevant.

    • by lucm (889690)

      I used to archive my emails, then one day by mistake they were deleted. For a minute or two I was freaking out, then I felt relieved. I needed to lose them completely to understand that I did not need them. It was like a security blanket (what if I need a cd-key I received by email, or if I want to read again the bad poetry I sent to my ex?), nothing else.

      For the last few years not only did I not archive my emails, I also made sure to change my email address once or twice a year to weed out the crap. And th

      • by jabberw0k (62554)
        Apparently you never do research that you will need to consult later, nor do you correspond with people who might die. Do you only live in the moment?
        • by lucm (889690)

          If you rely on email for research, you have a bigger problem than living in the moment

  • by Just Brew It! (636086) on Sunday December 04, 2011 @04:26PM (#38259474)
    Even at today's post-Thailand-flood inflated hard drive prices, your entire e-mail history occupies less than a dollar's worth of disk space. I fail to see the issue.
  • by subreality (157447) on Sunday December 04, 2011 @04:26PM (#38259476)

    For my own mail archives I just use mutt and weed things a bit by hand. I find that 90% of the mbox size is in fewer than a dozen attachments, so I can hand-filter those out in ten minutes once a year. Beyond that disk is too cheap to care and time is too valuable to make a really comprehensive solution. So what I do:

    'mutt -f archive.mbox'
    ':set pager_index_lines=6' (Lets you see the message index split above the body)
    'o' (Order), 'z' (siZe), End (last entry), Enter (Open).
    while(mbox.size > acceptable_size)
    {
            'v' (View attachments)
            'jjj' (down a few times to the attachment I want to nuke)
            'd' (Delete)
            while(more attachments) { 'd' (Delete more attachments) }
            'q' (Quit back to the message view)
            'k' (previous message)
    }
    'q' (Quit back to index)
    '$' (Sync changes to disk)
    'q' (Quit mutt)

    Note the 'j' and 'k' are vi-style up/down. The arrow keys work too if you're not a home row junkie like me.

    I don't know a good fully automated way to do this that's ready to slice it right out of the box. If you want to roll your own, just pick up a library like RMail or TMail for Ruby, or equivalent for the language you prefer. That's 80% of the work done but you'll still probably find a dozen corner cases involving oddly-named HTML-alternatives named things that look like binary attachments or terribly malformed spam.

  • The Eudora Mail User Agent (i.e. email client) stores attachments in a directory as binaries but yet keeps the text of emails intact. Thus you should be able to import the email into Eudora, then when you export it the attachments should be stripped.

    This is also exactly why I don't use Eudora anymore, because attachments get stripped off when exporting the email (or at least that's the way email export or import from/to Eudora worked last I used it).

    Now, although this explains one way attachments can be st

  • by Lazy Jones (8403) on Sunday December 04, 2011 @05:38PM (#38259960) Homepage Journal
    Google keeps a permanent copy anyway...
  • by bmo (77928)

    We're worrying about 500MB?

    Even at today's outrageous price-fixed (you know it's true) hard drive prices, you're talking 14 cents a GB. For your situation, we're talking 7 cents.

    You're complaining about 7 cents worth of storage space. And to cut down on this you want to mangle the archive?

    You're tight on space? Buy another drive, burn to CD/DVD.

    For those of us who grew up with a Corvus shoebox hard disk costing thousands on the Apple ][ network, this is a ridiculous "ask slashdot" question.

    --
    BMO

    • by unitron (5733)

      Possibly the first time in the history of the planet that a flood has fixed something.

  • by neurocutie (677249) on Sunday December 04, 2011 @05:48PM (#38260020)

    back to ARPA mail and UUCP mail days...

    for a while I used Eudora and every month religiously took each piece of email and filed it away in suitable mail folders. After Eudora started declining and I got too busy, I stopped that, but even now, religiously every month I clean out my mailbox of all junk and unwanted attachments (trimming 60-100MB to usually 20-30MB) and then stack that months email away as a single mbox file, and start fresh with a new Inbox.

    the old mailbox files are on an IMAP server that I can easily read emails from at least 10 years ago -- older with a little more effort. As single mbox files each, I can do greps on them also. Seems to be an okay way to keep the stuff, some of which has proven to be important over the years....

    another big help: all semi-junky and non business emails I let Hotmail do the work (vendor stuff, Amazon orders, etc). Have been using Hotmail since before MS bought it. Works well as a place to direct mostly junky vendor stuff.

    • by markdavis (642305)

      That is exactly what I do. I have been using Hotmail way before it was Microsoft. I use that address for all vendor junk, netflix stuff, registrations, autoreplies, notifications, etc. I save my real Email address for stuff that matters more. Even still, that gets cluttered and huge pretty quickly (and I do spend time maintaining it and don't bother with archives).

      Not even counting stupid 100 Megapixel images people feel compelled to Email without resizing and other attachments, Email sizes are still do

  • Procmail (Score:5, Funny)

    by massysett (910130) on Sunday December 04, 2011 @05:51PM (#38260036) Homepage

    Google for "procmail remove attachments":
    http://osdir.com/ml/mail.procmail/2002-11/msg00091.html [osdir.com]

    That will get you started. You can do most anything with Procmail after you figure out the rather odd configuration file format.
    Make sure you have it backed up first because it's also quite easy to destroy data with Procmail.
    After you spend a lot of time futzing with Procmail scripts and sed and formail and the like, you'll wonder why you didn't go on Amazon or Newegg and buy a $10 flash drive that will hold all your mail several times over.

    • by bmo (77928)

      This is the only way to do it... if your time is entirely worthless.

      If we measure time in minimum wage, the OP spent more time composing this question and submitting it than if the OP had just spent 7 cents worth of disk space and archived it away.

      This is a troll "ask slashdot"

      --
      BMO

      P.S. Where i get my 7 cents from: Go to Newegg. List internal 1TB drives by price. Pick lowest. 140 bux divided by 1000 = 14 cents per Salesman GB. He's using half. 7 cents.

      • by mcmonkey (96054)

        This is the only way to do it... if your time is entirely worthless.

        If we measure time in minimum wage, the OP spent more time composing this question and submitting it than if the OP had just spent 7 cents worth of disk space and archived it away.

        This is a troll "ask slashdot"

        --
        BMO

        P.S. Where i get my 7 cents from: Go to Newegg. List internal 1TB drives by price. Pick lowest. 140 bux divided by 1000 = 14 cents per Salesman GB. He's using half. 7 cents.

        But what if addition to storing old email, he ever actually needs to go back and search or read old email?

        You're the one saying his time is worthless by only looking at the cost of hard drive space.

        • by bmo (77928)

          >But what if addition to storing old email, he ever actually needs to go back and search or read old email?

          So? What of it? Show me a modern computer system that cannot handle 500MB of email. Show me a /smartphone/ that cannot handle 500MB of email.

          >You're the one saying his time is worthless by only looking at the cost of hard drive space.

          Am I right in saying that you think he's going to /manually/ go through his email to find stuff? Why? Isn't that why we have computers and search algorithms?

          See

    • by txoof (553270)
      Fantastic. In all my googling I never came across that. I'm going to have to give that a try. It's orders of magnitude more elegant than the disaster I've been kludging together. Thanks!
    • Honestly, I kind of wish I had an email client that did this for me. Or maybe more to the point, I wish I had an email server that did this for me. What I have in mind is, instead of the normal attachment system, have the server automatically strip out attachments and store them where they can be accessed by webdav/http. Where the attachment was, substitute in a link to the attachment instead. that way, I could browse my attachments like a normal file system, delete stuff as I like, but the email messag

  • by nurb432 (527695) on Sunday December 04, 2011 @06:02PM (#38260112) Homepage Journal

    Amateur. When you get to 8+ gb then we can talk about 'large archive'. Until then, just stick it on a CD.. you don't even need a DVD for that.

  • you will be very sorry you deleted those pictures. don't do that. Even right now, you could make many people very happy by giving as gift one of those digital picture frames that display different stored photo every several seconds, with your pictures of those important to recipient.

    • by markdavis (642305)

      Email is not supposed to be a file repository. Although, it seems like every day I find people who treat it just like that. When I get attachments that matter and need to last (such as nice/important pictures), I save them off and put them in an appropriate directory structure. Then I KNOW where all my pictures are located. They can be backed up appropriately. They can be viewed logically.

      • by iggymanz (596061)

        but I back up my emails and the database that indexes them by backing up the whole thunderbird directory, so I can search by topics or phrases. I think its better that way. Only takes up 2.5GB on the 10/20GB tapes I use for 12 years worth. I do regret not having my emails from the mid 80s to 1998 as they were on disparate vax/vms and Unix systems, but oh well.....

  • Enables you to save everything off line as a pdf. Personally I don't get the question or see the point. My archive is about 6 Gig, all backed up all searchable. Anyway the company that makes the software is www.spotdocuments.com Just back up
  • Something Like This? (Score:5, Informative)

    by pscottdv (676889) on Sunday December 04, 2011 @06:33PM (#38260428)

    We all think you're crazy, but here it is:

    #!/bin/env python
    from mailbox import mbox, mboxMessage

    orig_mb = mbox(path/ot/orig/mbox)
    new_mb = mbox(path/to/new/mbox)

    for key,msg in orig_mb.iteritems():
            new_msg = mboxMessage()
            payload = msg.get_payload()
            if msg.is_mulltipart():
                    payload = payload[0].get_payload()
            for header in msg.keys():
                    new_msg[header] = msg[header]
            new_msg.set_payload(payload)
            new_mb.add(new_msg)
    new_mb.flush()

    • by rgbscan (321794)

      Will this work to save the attachments somewhere? I have a similar question to the OP but in reverse. I'm tired of searching my email for attachments that have been sent to me over the years (going back to '97) and would like to take my mbox, run it thru a program, and have all the attachments end up in a directory of my choosing. I can then delete or file them, having them now in a more sane place than using email as a file system

      • by pscottdv (676889)

        This script throws the attachments away which is what was requested. Saving them is a little more complicated.

  • As others have said, the headache you will have if you do want to come back (potentially years later) to that one email you know you had only to find your attachment-stripping program has foobar'd the whole archive up (or that you need the attachment after all) probably isn't worth the hassle for saving 500MB per year this year (even taking into account reasonable growth rates - I'd note that bandwidth per $, which will be the factor limiting your email size, has been growing rather more slowly than storage

  • These are a few of the tools that I use (Unix/Linux, of course):

    formail (part of the procmail distribution) is very useful for rewriting mailboxes.
    uuexplode is useful for discovering and yanking out attachments.
    grepmail is REALLY useful for discovering messages which match certain criteria.
    csplit is useful for more than mail, but it also has applications with mailboxes.
  • My personal email archive goes back to 1996, and is still only 262MB.

    My Google archive uses 164MB.

    I've no idea what my Yahoo account uses.

    But 500GB of email?!?!?!?! Are your relatives sending you entire videos as attachments or accidentally copying their entire music archives?

    • by msobkow (48369)

      Ah, I get it. 500MB of 2010 emails. I'd misread that as gigabytes.

      Still, you've got as much email for one year as I've saved my entire life. Something doesn't add up.

  • Deal with the superfluous attachments first and then see how you feel. Attachments are often unnecessary baggage.

  • You are talking about $0.035 in storage costs

    Literally. I bought 3TB for $200.00 at Fry's yesterday. You probably have more than 500M of "Angry Birds" on your cell phone.

    If the point is to pull the data off your gmail account and not have it stored there (maybe you want to migrate away, maybe, you are trying to get us to design a product for your file hosting service adjuntct to gmail, whatever), fetchmail is a terrible tool, particularly since gmail permits IMAP4 access, and you don't have to worry about

  • by matty619 (630957) on Sunday December 04, 2011 @11:46PM (#38262480)

    Is a project a friend my mine started. Interfaces w/ gmail's API, quite easy to use.

    www.findbigmail.com

  • I keep my email in maildir format (the default for Claws-Mail), and rotate every six months. The whole process is entirely manual, but since any given step only takes a few seconds, it works fine for me.

    Emails are sorted on receipt according to source or content via ordinarily filters. Every email I receive that's worth keeping goes into a catch-all folder after reading. I probably should be preserving the sorting when I move something into that catch-all, but I don't receive enough email to bother with

  • If you use, or have access to a Mac, the Apple Mail client has for some years had a Remove Attachments option in the Message menu. Simply select all your mail in a folder with Cmd-A and select that menu option and it'll do exactly what you want. I use it regularly to prune my database.

  • I have all my emails since 1990 or 1991 :-/ and it's true that I use less and less emails... I saved them in a mbox format and used a script to remove binary attachement (yeah, cat pictures and things like this). It's mainly text and can be highly compressed.
  • I still have the tapes, but the desire to read them. Since the mid-1990s I've have had cloud-email (hotmail) and havent really lost anything.
  • Email is extremely convenient for file transfer, but I prefer not to have my mail store so bloated.

    What happens when my friends want to start emailing me movies?

    I haven't figured out the true nature / fundamentals involved here.

Parkinson's Law: Work expands to fill the time alloted it.

Working...