Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Communications Data Storage

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive? 167

First time accepted submitter txoof writes "I have a personal email archive that goes back to 2003. The early archives are around 2 megabytes. Every year the archives have grown significantly in size from a few tens of megs to nearly 500 megs from 2010. The archive is for storage only. It is a mirror of my Gmail account. The archives are both sent and received mail compressed in a hierarchy of weekly, monthly and yearly mbox files. I've chosen mbox for a variety of reasons, but mostly because it is the simplest to implement with fetchmail. After inspecting some of the archives, I've noticed that the larger files are a result of attachments sent by well-meaning family members. Things like baby pictures, wedding pictures, etc. What I would like to do is from this point forward is strip out all of the attachments and only save the texts of the emails. What would be a sane way to do that using simple tools like fetchmail?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Handling and Cleaning Up a Large Personal Email Archive?

Comments Filter:
  • by zmughal ( 1343549 ) on Sunday December 04, 2011 @03:49PM (#38259150) Homepage
    There is DBMail [dbmail.org].
  • Something Like This? (Score:5, Informative)

    by pscottdv ( 676889 ) on Sunday December 04, 2011 @06:33PM (#38260428)

    We all think you're crazy, but here it is:

    #!/bin/env python
    from mailbox import mbox, mboxMessage

    orig_mb = mbox(path/ot/orig/mbox)
    new_mb = mbox(path/to/new/mbox)

    for key,msg in orig_mb.iteritems():
            new_msg = mboxMessage()
            payload = msg.get_payload()
            if msg.is_mulltipart():
                    payload = payload[0].get_payload()
            for header in msg.keys():
                    new_msg[header] = msg[header]
            new_msg.set_payload(payload)
            new_mb.add(new_msg)
    new_mb.flush()

  • Re:Why bother? (Score:3, Informative)

    by Anonymous Coward on Sunday December 04, 2011 @07:12PM (#38260800)

    That's easy. (Old school) Eudora uses the mbx format, but separates the attachments from the mails.

Remember to say hello to your bank teller.

Working...