Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Unix Operating Systems Software

What Mailbox Format Do You Use And Why? 364

RossyB asks: "What format for my mailbox is best? The University of Washingtom IMAP server only supports mbox, and claims that maildir is slow and dangerous. Qmail only supports maildir, and claims that mbox is slow and dangerous! Who is right? Why?" I think one of the large problems with the adoption of maildir is the lack of MUA [?] 's that support it.

"I currently store all of my e-mail in a local mbox-style IMAP store in ~/mail/, so that I am not tied to any particular mail client. However, I am planning on syncing my mail across multiple machines (home, work, and soon a laptop) so I need to have mail in a form which can be synced easily. MBox is bad for this because if I grab mail on one machine, and later delete some mails from the same folder on another machine, then sync, the new mails will be lost. This is where maildir is good - each message is a separate file. But why do so many people hate it? If I do change over to mailbox, what IMAP/SMTP servers should I use? A hacked sendmail/UoW IMAP? Courier-IMAP + QMail? Something else? How do other people keep their mailstores synced across many machines, and what software do they use?"

This discussion has been archived. No new comments can be posted.

What Mailbox Format Do You Use And Why?

Comments Filter:
  • by Anonymous Coward
    ls | xargs grep "search string" xargs is cool
  • If the poster was looking for a nice cross platform easily synchronized mailbox format. I don't think Exchange is going to be right for him, especially since he apparently runs (gasp) Linux.
  • I think you forgot to read the message. The poster wants a mailbox format that he can download on a laptop/PDA or some other device not connected to the net, read and respond to email, and the synchronized the system back up when he gets back online (IE, copy over the mbox or maildir).

    Exchange doesn't like that last time I used it, in fact it acts more or less like mbox in that respect, although it depends on how your mail is localy stored. Plus Exchange is pretty expensive for people who don't have large expense accounts and have to support a large base of people.

    Finally, how does supporting POP and IMAP make you a "lot more cross platform than UNIX mail"? Especially since UNIX based mail systems can do the same thing (and can share mailboxes between other non-Exchange servers if need be). My biggest beef with Exchange is the binary message format. Just try to resurrect a slightly damaged file, or search/modify something without having to fire up your mail client or web browser.
  • maildir's speed is far better under reiserfs. I can't say that there's no slowdown, but it's certainly much smaller. You can also do a significant amount of filtering based on filename, rather than mbox (where you've no choice but to grep through every message in the file).

    Re the wildcard expansion limit, xargs can handle that.
  • ReiserFS has been designed to be easily portable. That Linux is its {current,primary} target means little -- throw $20K (or whatever they charge) at Reiser and his team and a port will get done.

    It being that some organizations spend that much on a single server, this is pretty damn reasonable.

    And he's entirely right -- my experience confirms that using ReiserFS makes maildir handling much faster than under ext2.
  • ...you reinvented a wheel. In particular, the MAPI architecture.

    While much M$ software is poorly designed, MAPI is an exception. MAPI is a pretty flexible, intelligent architecture for all things messaging.

    MAPI allows you to do things like substitute message stores, address books stores, etc., by treating them as abstract components. Exactly what you're claiming to have done with your "data store API"

    I don't want to be too critical, but I hope you folks looked at MAPI before you went out inventing another API...
  • I just moved my qmail/courier-imap mail system from freebsd to linux (ffs to ext2) and performance is not so good when dealing with large directories.. I'd definitely recommend using an advanced fs on your mail server like sgi's xfs, ibm's jfs, or reiserfs. xfs is especially cool, because if the file is small enough, the file itself is stored in the inode and no space is allocated. That sounds great for storing messages to me! SGI also have created a redhat7/i386 installer cd, which allows for xfs-only systems (with 2.4) from the get-go. Tried it our last night, works like a champ.

    as far as mta's go, does anyone know if qmail supports secure sendmail (using sasl)? I'm running an old version of postfix on my relays, time to update.

    cheers,
    -o
  • ...Which means that you can't leverage your already existing tools (e.g. grep) to extract data.

    Actually, what it means is that you can use a high-performance, industry standard query language like SQL to extract data, instead of having to kluge together a patchwork of file and stream manipulation tools.


  • 1) It needs to be *trivial* to use standard unix tools on the mailbox to find things.

    e.g.,

    rmm `scan .-last| grep -e badthing1 -e badthing2|f4`

    should remove all messages with badthing[12] in the heading, where f4 is an alias for

    sed -e 's/\(....\).*/\1/' |tr '\n' ' '

    [I'll admit that I was briefly worried the first time that this was
    my reaction to a bunch of messages from a mailer gone nuts . . . ]

    2) it would be nice for the system to be hostile to abusive mailings--not by content, but from the idiots that send plain text messages in html and mime. That's not a user preference; it's *wrong*.

    3) Must be command line friendly. MUA's are for sissies. Real men read from the command line :)

    hawk
  • Actually, the Cyrus IMAP Server [cmu.edu] is open source and takes a similar approach. It's been deployed and in use at Carnegie Mellon for quite some time, and I believe that the older version of the engine has the bases for a number of commercial web servers out there, including the iPlanet mail server.
  • No, I don't; a quick web search actually picks up a number of Qmail-related packages that apparently now use the GPL, so it may be that his opinion changed.

    It used to be that qmail was only allowed to be distributed in source code form, and the CDB database system (it's a cool thing well worth looking at) used a license that was somewhat incompatible with the GPL. There seemed to be some rancor to the effect that the GPL wasn't a "free" license; seemingly an independent recreation of the "BSD bigot" approach to software licensing.

  • Even with ReiserFS, there is some overhead of space and time involved in managing files. Each file has a directory entry and an inode or two; while ReiserFS may unambiguously improve the time efficiency, that does not result in the space overhead falling to zero. At the very least, you've got, for each file: struct o_stat {
    o_dev_t st_dev;
    o_ino_t st_ino;
    o_mode_t st_mode;
    o_nlink_t st_nlink;
    o_uid_t st_uid;
    o_gid_t st_gid;
    o_dev_t st_rdev;
    off_t st_size;
    time_t st_atime;
    time_t st_mtime;
    time_t st_ctime;
    };
    which adds up to around 48 bytes, and add to that the size of the directory entry that attaches to the inode.

    It's not forcibly "ludicrously big," but it's space overhead nonetheless.

    As for "flaming," it's somewhat unfortunate that Dan

    basically doesn't care what anyone else thinks.

    If he tried to find some places for agreement, his software would probably get used more. Some of it's really very neat, cdb and the microscopic DNS server being particular examples...

    The fact that he comes from a pretty strongly "pure math" background means that he comes up with substantially different ideas than most people. The PM factor adds in two particularly useful things:

    • He thinks about the notion of proving correct behaviour;
    • He looks to the "kernel" of the ideas, and implements "mathematically small" systems that are easier to verify to be correct than much of the rubbish that others produce.
  • The crucial technical differences between mbox and Maildir are that:
    • By keeping all the data in one file, mbox should use disk space more efficiently as there will be no overhead for directory or inode entries.

      This may be important on a big mail server where inodes or disk space may wind up being scarce commodities.

    • By keeping all the data in one file, mbox keeps all the eggs in "one basket;" there is correspondingly less risk of corruption with Maildir .
    • Since the mbox format is a longstanding "tradition," there are lots of libraries that work with it. There are fewer to manipulate Maildir .

    There are then nontechnial issues.

    The creator of Maildir [cr.yp.to] , Dan Bernstein, [cr.yp.to] is a, um, "somewhat prickly character." Take a look at his criticisms of Postfix [cr.yp.to] for some mild material. Comparative discussions of Postfix and qmail have resulted in extremely inflammatory discussions. And Bernstein's attitudes towards the GPL seem similarly "inflammatory." This appears to have put some people off his software, whether rightly or wrongly.

    Personally, I use Postfix as my MTA, and push messages through Maildir as interim step to pushing them into MH, which is only a fairly small step removed from Maildir...

  • Exchange has its good points, this is true, but the biggest problem I have with it is that it holds my data hostage. I can't get at the mail spools if something dies and, if it does, you're fucked unless you also bought support contracts.

    We've been running qmail + vpopmail for over 1500 people with Maildir formatted message stores without a problem for over two years now. When something breaks, I can fix it. Data is stored either in the database or in regular old files. It seems to work very well on a mediocre P2 and has all the good stuff: (A)POP, IMAP (courier-IMAP), selective relaying (relaying is allowed after a successful POP or IMAP authentication), user-run mailing lists (ezmlm) and web configuration (vpopmail has a web client). Oh yes and Squirrelmail for the web based mail reading folk.

    There's one thing I learned early on and that's that I don't like having my data held hostage. The software I reccomend for the companies I advise for is pretty much any software is alright so long as either a) it's open-API b) opensource or c) I get copies (and updates) of the data formats. Surprisingly few companies balk at this.

  • Well, DoS attacks are easy to harden against:
    • Replace the pole with 6 inch steel pipe embedded in 3 feet of concrete footing.
    • Encase mailbox in welded frame of 1-inch rebar.
    • Cover rebar with wooden shingles, and optionally cover the pole with wood for looks.
    • Hook your security camera to your VCR so you can watch them rattling their teeth out whenever you want.
  • <pedant>

    Your commandline does not solve the problem that the original invocation of xargs was intended to solve - passing a *huge* number of files to grep on the commandline (grep * in a directory with a ton of files) causes it to break.

    xargs works two different ways depending on how you invoke it.


    $ /bin/ls | xargs grep "foo"

    is the equivilant to

    $ grep "foo" `/bin/ls`

    or

    $ grep "foo" 1 10 11 12 13 14 2 3 4 5 6 7 8 9


    Whereas invoking xargs like this :

    $ /bin/ls | xargs -i grep "foo" {}

    is congruent to :

    $ grep "foo" 1
    $ grep "foo" 2
    $ grep "foo" 3
    $ grep "foo" 4
    $ grep "foo" 5
    $ grep "foo" 6
    .
    .
    .


    So, umm, there, and such.

    </pedant>
    --
  • filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.

    Yeah, being to grep to find a particular message properly is really handy - as is being able to kill all the messages containing 'University Diploma' with just find, grep and rm...

    The other thing I've found in the past with mbox is that if you're really unlucky, the POP3 server will make a temporary copy of your (whole!) mailbox before doing a UIDL/LIST. qpopper used to do this at least, and you really knew about it when someone had a 30Mb mailbox. Maildir has a minimum of file shuffling and reading/rewriting.
  • The Citadel/UX project is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange.

    On the face of it, this statement makes no sense at all. The big mail communications servers these days are the Internet MTAs, which in all the major ISPs handle typically many millions of messages per day on behalf of millions of customers per ISP. As others on this thread have mentioned, Exchange runs out of steam if you push it beyond some 2000 users per server -- it just doesn't scale, so it's not "Enterprise Grade" by any stretch of the imagination, it's out by 2-3 orders of magnitude. You've got to stop believing manufacturer's propaganda.

    You should compare Citadel/UX to qmail or Exim installations in large ISPs, not against toy systems. Server farms with dozens of hierarchically-organized, multi-CPU MTAs which provide the massive underpinning to the world's Internet mail traffic, those are the "Enterprise grade" systems of today, not the relatively puny corporate systems of yesteryear being portrayed as "Enterprise grade" by manufacturers of personal computer software with more money than experience.

    I feel I must also comment on your novel use of the word "robust". If one compares the reliability, availability and robustness of a flat file to that of even the simplest database system, the mind boggles that anyone could consider the database system as anything but the less reliable of the two by a collosal amount.

    We run massive database systems here from the best regarded RDBMS manufacturer in the industry and configured with their help, yet even our DBAs will admit that the reliability of their databases is not brilliant. In contrast, the reliability of Exim is, er, well, it has never failed, so I guess the reliability is infinite. And I hear that qmail is likewise excellent in that respect. How the hell is a database going to improve on that kind of reliability and robustness?

    Even the best databases crash and corrupt data every once in a while, and a new database could easily be less stable rather than more. But I've never had a flat file crash on me.

    If it makes you feel any better, Unix is a sort of combined I/O multiplexer and storage mechanism, which inevitably makes it a particular kind of database too. To get the most out of it you should leverage its capabilities instead of trying to impose a totally different semantic on top of it. You'll never gain robustness by adding complexity.
  • Why the Berkeley DB? Why not a whole bunch of databases? This seems really short-sighted to me. I personally would like to store my email in mysql, since I already have a MySQL db. I'm sure many others would like to use mSQL, Oracle, Sybase, or even M$ SQL Server.
    Quite simply: ease of installation. While the very big installations have lots of DBA's on hand, your typical "plug it in and go" shops don't. Installation really needs to be easy - just install the software, plug in a few variables like your domain name, and start running.

    However, you'll be happy to know that we've wrapped all of the database calls into a data store API. Recently we made the transition from GDBM to Berkeley DB without having to rewrite everything -- just drop in a new data store module and re-import the data (yes, there's an import/export utility). It would be quite straightforward for someone to write a data store module that uses MySQL, Oracle, or whatever.
    --
  • I use the same setup (Qmail, maildirs, Courier-IMAP, and SquirrelMail). I've been really happy with it because I can access my email from Gnus in Emacs (using IMAP), other MUAs that support either maildir or IMAP, and SquirrelMail (webmail) when I'm using a remote computer.

    I'm very concerned about security, so I configured Courier-IMAP to ONLY provide SSL/TLS secure POP and IMAP. I set it up to provide insecure (non-SSL) service only on localhost (127.0.0.1), but not visible over the network. That way SquirrelMail or MUAs running on my server can get to it without SSL, which is OK because there's no way for someone else on the wire to eavesdrop. Of course, I also have the .htaccess file for SquirrelMail set up to only server over SSL/TLS (see below), and I don't allow telnet, rlogin, or non-SSL'd FTP. into my server.

    I'm somewhat interested in developing up with a database back end for the IMAP server, so that old archived email can be stored more efficiently than either a maildir or mbox, but still be readily accessible.

    # .htaccess for SSL-only services
    # Options -Indexes
    <IfDefine HAVE_SSL>
    SSLRequireSSL
    # insert the https: URL of the service in the next line
    # for automatic redirect if the user attempts a non-SSL connection
    ErrorDocument 403 https://host/webmail/
    </IfDefine>
    <IfDefine !HAVE_SSL>
    # this is to make sure that if the web server is accidentally started without
    # mod_ssl, the web pages won't be served up insecurely
    Deny from all
    </IfDefine>

  • mh is king.

    refile +foobar `pick -from foobar`
    will move all messages from "foobar" into my foobar folder in about 15 keystrokes (with autocompletion).

    refile -link +foobar `pick -search project6` +project6
    will refile messages in my foobar folder containing the text "project6" to my project6 folder using hard links. Now the messages exists in both folders.

    I can type inc, show, next, comp, etc. in any terminal window at home or at work, and the right thing happens (with a few ssh tricks and gnuclient). No fumbling for some icon to click on, or waiting for the gui to come up, or finding the window running my mail agent...

    The only drawback is that after a few hundred thousand messages scattered in hundreds of folders indexing the files for backup can take a bit of time, "what do you think I'm running here, a news server?"

  • "You have to type your password into the new client--maybe we should store that on the server too?"

    Yes, you do have to store the password (or a derivative thereof) on the server. Otherwise, the server would never know if you typed in the correct password or not. But, I think you're poorly trying to make a point that not all data should be stored on the server.

    It's true; not all data should be stored on the server. Like certain subscriptions. Of course, the client doesn't have to use the server's capabilities to manage subscriptions.

    I would like to have a client that allows me to choose server-based or client-based management of subscriptions and recent messages. That way, I could say "I always want this subscription, but this other subscription should only show up when I'm using balsa from home" or something. That would not be possible if the server could not store subscriptions, but the ability to store subscriptions does not prevent the client from doing its own management.

    And race conditions in the spec should be fixed. They're not excuses to throw away the idea entirely.
    -Dave
  • That's okay. UW supports several other formats other than mbox, too (including mh and several others where performance doesn't actually suck), but the poster apparently hasn't read any documentation.

    maildir format does not scale well to large mailboxes on large servers because it has no sort of overview cache information. Mark Crispin (author of UW imapd) correctly deduced that MH format sucked for the same reason that qmail format sucks, and refused to implement it. Without a way to do overview information, getting headers to do the message list is excessively slow.

  • ...you know, you can do a: grep "stuff here" * and search through all the files in the directory.

    Yes, I'm aware of that. The problem is that it's dog-slow. Opening and scanning 2000 files for one mailbox alone is just darned painful. Even if the mailbox is hundreds of megabytes in size, 'grep' will operate on it faster if it's a single file than if it's zillions of separate files.

    Also, when your mailbox grows to thousands of messages, the wildcard expansion in the shell ('*' in your example) may overflow or truncate, and you may not actually scan all the messages. Yes, you can resort to foreach, but then not only are you opening zillions of files, you're discretely launching 'grep' a zillion times as well.

    Like I said, I admire 'maildir's reliability, and it's certainly more flexible in certain ways., and if I could get the same or similar search speed out of 'maildir', I'd switch. But for the moment, 'mbox' serves my purposes.

    Schwab

  • I have experience working for a company that hosted millions of users with a maildir format. There are some problems with it. First, some filesystems are just not built for having zillions of inodes and tiny files. WAFL, used by Network Appliance, can fail under this sort of load. Secondly, maildir file names can be quite long. There was a bug in a version of Solaris where the operating system would not cache file contents of an NFS-mounted file whose name was longer than 31 characters. This can result in very poor performance.
  • You're right that the problems I describe aren't really problems with maildir. They are problems in other products taht are tickled by maildir. But if you are trying to run a large operation, you quickly decide that maildir is the problem since you don't have the ability to change Solaris or WAFL :)
  • Two more reasons why mbox sucks:
    • most UNIX file systems are optimized to work faster with a large number of small files, rather than one huge file.
    • you've just gotta hate a format where the common English word "From" at the beginning of a line is used as a delimiter.
  • IMHO you seem to mix up a few thinks. You shouldn't compare Exchange and qmail. since the latter is written with the "do-one-thing-but-do-it-right" paradigm in mind. it is only an MTA, that means it delivers mail to your mailbox. while exchange does a whole lot more. using the right tools (e.g. courier-imap, procmail, fetchmail, etc.) you can get most (if not all) functionality you get with exchange. and each of those tools uses the DOTBDIR paradigm. you just need to combine the appropriate tools, and voila...
    IMAP? courier-imap.
    security? you have to make a tradeoff since you refuse to use proper products, there's no tradeoff using qmail-courier-imap-ssl-mutt-whatever.
    workgroup facilities? there are a lot (Evolution, many webbased) so that's a moot point. you get everything.
    Resources? qmail is a lot more resource friendly than Exchange...
    still there's a tradeoff using OS-tools, you need somebody put all this together. a smart guy...
  • This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.

    Yes, it is great until the two get out of sync. If you can limit access to the raw filesystem, then that'll eliminate most of the problems, and most of the advantages.

    Besides, databases are a lot better (these days) at storing large hunks of arbitrary data, so I'd just stick everything in the database.

    That or use a future version of reiserfs, which could give you a database-like view of your filesystem.

  • Yes. Mailbox just sucks for POP3 and IMAP; the server has to lock the file, copy it, and rewrite the entire damn thing, just to delete a message or stick a flag in one. It's just painful to see the amount of work an IMAP or POP3 server has to do when handling one of these obscene 30MB mboxes with lots of word attachments.

    What I do is configure maildirs for everyone on the mail server, using either qmail or postfix (both can deliver to maildir; qmail is more minimalistic but a bit confusing, postfix is about as good and a lot more understandable), and then setup qmail's pop3 daemon (even if using postfix to deliver). This combination has worked so well for me that I use it both on server and on my desktop computer (getting mail from pop3 with fetchmail, delivering into maildirs, reading with mutt).

    The only thing to make sure with maildir is that you have enough inodes. But that's easy to handle when formatting the partition, and (even better) you could use reiserfs, which has dynamic inode allocation and handles large directories of small files very well.

  • Replace the pole with 6 inch steel pipe embedded in 3 feet of concrete footing.
    An even better solution is a length of 132 lbs (to the yard) rail. That does wonders as a truck bumper, imagine what it will do as a mailbox post!

    --

  • Couple of considerations have to be made regarding choice of mbox formats. Here are my thoughts:

    Flat mbox file:
    pros: easy to set up, accessible.
    cons: subject to locking issues,
    not scalable, limited to local fs
    Maildir format:
    pros: fast, highly scalable, good
    performance, very few locking
    issues, reliable
    cons: limited user access to directory

    Proprietary db format:
    pros: transactions, scalable
    cons: expensive, corrupts easily,
    word of warning:
    backup frequently if you are
    using MSexchange.
  • Not FUD: The Exchange guys at my old job were sharp, loved Microsoft products, and generally kept Exchange up.

    Their points:
    1) No version of Exchange had a stable message store until 5.5SP1. According to them, that's at least 3 years on the market, corrupting mail all along! But it does work fine now, and Ex2000 solves the '1 big database' problem.

    2) They had weekly maintance downtime to handle the database issues. That meant they took turns coming in on Sunday mornings. Whoop for them.

    3) Even so they still occassionally had niggling database consistancy problems which they never could quite work out. When these things were happening, people would get nervous because basically the server could crash anytime. Many times they had to go offline and restore the entire messagestore from tape to solve these things.

    Meanwhile, I used to do some Notes stuff. Notes has it's own problems, but at least you could backup and restore mailboxes with the COPY command, as well as solve DB corruption and whitespace issues (which cropped up rarely) with the server online. I never had to come in on the weekends at least. But to prove this isn't FUD, I'd take the Outlook interface over Notes or Netscape any day of the week
    --
  • Courier-IMAP works with it. So every IMAP client works with it. Of course, mutt works fine with it, too.
  • It is RECENT for that particular client but not for the end user. And end-users are the ultimate target of email systems. Clients just help make reading the e-mail less painful.

    I'm still scratching my head trying to come up with a scenario where a user would want all of his mail to suddenly be marked UNSEEN behind his back. On the hand, every user I've ever met likes the scenario where switching to a different client maintains the state of his email world.

    But you don't have that feature now.

    There is a vast difference between a race condition that might affect erroneously flag some mail and a design that always erroneously flags all mail. In the four years I've been using IMAP I've never had this race condition hit me. Despite your claim, I do have this feature now.
  • by crow ( 16139 )
    I use exmh as my mail client. The mh tools use a separate file per message. Here are the issues with it as I see them:

    Advantages:
    * Easy to access any message with standard Unix text utilites (grep, more, and such).
    * No worry about corrupting the entire mailbox if one message gets clobbered by a broken client (or broken file system or whatnot).
    * Incremental backups and syncronization is easier

    Disadvantages:
    * Uses lots of storage. [Oh wait, I work for a storage company, so this is an advantage.]
    * With one file per message, you can get more files in a directory than your shell will allow you to use as command line arguements. (e.g., `grep important *` may fail)

    I guess the big safety issue is how well it behaves if you have more than one mail client accessing your email at a time. I don't see this as a very likely situation, but still something that should work.
  • I used to work in a place that stored about 20 terabytes of certain documents it worked with, which varied in size from 1K to 5G each. Median size was about 40M. All the meta data, like what customer it applied to, dates of processing, and so forth, were stored in a database. But the actual document file never was. The network path to the document was in the database, but the documents were stored on hundreds of Novell (ick) file servers. The database was still the major bottleneck of the whole operation. All these wonderful database facilities like SQL don't mean squat when the main functionality was to get the document, process it, and store it back, which is what happened most of the time. Of course it was nice to have the SQL when you needed to manually check on things or do some odd searches. But I would never store bulk data in a database; only the pointer to it would go in there. Databases are faster at complex searching, but not at bulk delivery of data.

  • Why are they concentrating so much in a single box like that anyway? Why not a few separate smaller boxes?

  • You have got to be fucking kidding me!
    I haven't used Exchange 2000 but Exchange 5.5's mailbox format is a piece of shit! Its one huge flat file. And I mean HUGE. Plus the "Jet" database format it uses is slow as balls. And to top it all off, you have to take the service offline to defrag it! Unless you love getting up several Saturday mornings a month because your users can't check their email, then exchange isn't for anyone.

    -Lee

  • That or use a future version of reiserfs, which could give you a database-like view of your filesystem.

    I find future versions of trendy software to be pretty impossible to use in building a workable solution...
  • Three different string encapsulations. That should be enough to tag it as bletcherous.
    -russ
  • You're welcome. :)
    -russ
  • Pardon if this answer is a slight bit distracted, I'm doing this and an OS install at the same time on two machines.
    First of all, databsses can handle large amounts of arbitrary data, such as BLOBs or big chunks of text.
    Try:
    grep -i 'contact address' /var/lib/mysql/maildb
    As it turns out, this is not as good an idea as it might sound.... :-(
    Secondly, I think you missed the fact that maintaining two separate data stores (database for headers, filesystem for message content) will certainly be more work than just using one or the other.
    insert into headers values ('foo@bar.com', 'Re: This is just a test', '/var/spool/joeuser/inbox/13948')

    Doesn't seem to hard to me. As far as consistency checking goes, you can ignore the on-disk text except for displaying the message. If you want to use the headers from the file to refresh the database in the event of coruption, fine, but it's not a big requirement.

    Lastly, a filesystem and a database are both stored on a magnetic disk (for the most part). How is it easier to corrupt one than the other?
    Any backups of my data that I keep are also stored in the same physical universe, but I don't use this as an excuse not to keep backups. Having the headers lying in a plain text file to sanity-check against can only help.
    So, what do you do in th ereverse case from what you described? Suppose the portion of the filesystem containing the mail is lost. Can you rebuild those lost messages from the database which you would have store only the headers?
    Generally when one assesses risk, one works with cost/benefit tradeoffs. What you propose is very costly in terms of database resources, whereas duplicating headers on disk is very cheap. This cost comes in terms of disk space, time used to duplicate the data (which in a very large system could be staggering for every message body), etc.

    I think you will find that the benefits of storing headers twice will far outweigh the cost of having done so. I can't say the same for storing open-ended (in terms of size) message bodies in a relational database.

    I say do one or the other, and then BACKUP OFTEN!
    Nice idea, but we're talking about software design here, not system administration procedures. Clearly a sysadmin should be backing the data up, but to tell the user, "something looks odd here, go chase down a sysadmin and make him restore a backup," is a lot less friendly than, "I found some courupt headers in message 501719, correcting..."
  • Because it makes harder for root to read everyone elses email?. Actually as a sys-admin myself i'd love to have mail stored in a way where only the recipent can read it.

    Unless the "binary" is encrypted data then it's hardly going to make a difference. Also the encryption key had better not be stored anywhere. Otherwise "su -l \" will do the trick anyway.
    Let alone that in many enviroments encrypting mail in such a way that only the the user could read it would be a very bad idea.
  • Maildir seems elegant at first, but it has one problem: our filesystems suck. We need a filesystem that is good and fast at creating, opening, and deleting files, even when there are 20000 files in a single directory.

    What filesystem do you have /var/spool/news under?
  • I balked when one of the sysadmins at my work suggested trying out our experimental exchange 2000 server. But, it supports IMAP and Webmail, so I checked it out. Man, is that great! The webmail is really what puts it over than just plain IMAP, although I guess an IMAP server could have a webmail client as well.

    Try having a look at www.courier-mta.org
  • Whatabout current implimentations of mbox that need to be converted into maildir... Can this even be done in an orderly fashion, or is it just slash and burn /var/spool/mail?

    Quite trivial, since it's simply a matter of cutting up files into smaller bits. Can't have anything else accessing the mbox file at the time, but once the MTAs and MUAs have been switched to maildir then nothing else should be looking at it.

    If maildir is indeed the great thing that some people make it out to be, you'd think that there would be more people switching

    The problem is MUA writers tending to ignore maildir. Even though they will happily put the effort into more complex or redundant ways of accessing email. e.g. kmail has inbuilt POP3 support, but every machine it can run on can also run fetchmail.
  • I can just imagine what went on in his head "hmmm....I have to find some format for RFC8222 messages together. Oh, I know! I'll just throw them in one big file. Wait, but how will I know where e-mails end? What's one of the popular words in English? From! I'll use "From " to distinguish e-mails, and let people quote from-lines"

    There is always MMDF which does the same thing, except for using ^A as a message separator. Other than this it has all the same "features" as mbox.
  • UW IMAP has different requirements. It doesn't just place mail on the system and leave it at that; it needs to read all of the mail that is there. Maildir will suck for that task. With maildir the IMAP server will need to open every single file and read some info and then close the file again. If you have a folder with a couple hundred emails in it it will very quickly thrash your system(100s of opening and closing of files for just a couple bytes of data from each one). MBox on the other hand, you just open one file per folder and parse through that. For the delivery this can cause problems because you need to open it and append to the file. For reading it is much easier on the system, but you take up a lot more memory and have the possibility of corruption(if you are careful that is a pretty low possiblility).

    Except that mail "readers" don't just read. They also do things such as add metadata, delete, move mail around, etc. With mbox metadata is commonly done through adding extra headers into the existing file inserting stuff into the middle of a file is expensive as well as meaning that anything other than exclusive access probably isn't possible. With maildir it's simply a matter of renaming the file. To delete a message with mbox you either have to leave holes in the middle of the file (and "compress" it later) or rewrite as you go. With maildir simply delete the file. To move with mbox it's a matter of a file append followed by a delete. With maildir it's simply a rename.
  • For maildirs, you would do a mv(1) into the tmp subdir, which is essentially free, rather than qpoppers copy to a different fs (/tmp) which is slow and expensive)

    Actually the latter is probably even more expensive since it isn't a simple matter of copying the data a chunk at a time from one file to another. The code doing the copying needs to look at the data being copied, either generating an index or verifying an index... As well as adding metadata by inserting extra data into the file (or the copy).
  • It abuses the filesystem with one file per message in the same way that mh folders do.

    Guess /var/spool/news must really "abuse the filesystem" then. Odd that in nearly 20 years noone has come up with an alternative.

    Unless you're running a decent btree structured filesystem like XFS, ReiserFS or JFS, expect a performance hit if you get thousands of messages in a single mailbox.

    Expect an even bigger performance hit if you have lots of messages in the same file. You must use lots of locking (and it must be reliable otherwise the whole thing will get corrupted). Things such as index files must be understood by every piece of software which does anything with the file, etc. Effectivly you will end up trying to enumatle a file system in user space software.
  • 1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.

    There is another consquence of this maildir supports an arbitary number of processes reading and writing at the same time. The mailbox format requires complex locking, even then adding new messages has to be strictly serial.
    Maildir is also a better analogy with paper mail. Mailbox would be something like you have a scroll of all the messages pasted together which you periodically have to hand to the postman for more bits to be stuck on the end...
  • The other thing I've found in the past with mbox is that if you're really unlucky, the POP3 server will make a temporary copy of your (whole!) mailbox before doing a UIDL/LIST.

    It's not just pop3 servers which do this, indeed it's almost the standard way of processing a maildir file.
  • You CAN use NFS, if you want -- without getting real paranoid.

    You could even use SMB to access the mailbox from a Windows workstation. (Or at least you could if the software existed.)
    A point which hasn't been mentioned is that accessing email from a workstation using file sharing (the same file sharing which is in use anyway) means no need for additional password entry (or storing passwords in plain text/reversable encryption formats.) User simply needs to log in and there mail is there. If they log in on more than one machine everything still works fine too...
  • Actually, what it means is that you can use a high-performance, industry standard query language like SQL to extract data

    SQL being so "standard" that software vendors demand a specific implimentation... Pull the other one, it's got bells on!
  • The only objection I can see the Linux camp having is qmail is released under a "non-free" (as in freedom) license.

    The licence is likly to upset both GPL and BSD diehards. Also who the author is may be an issue too...
  • Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.

    Sounds like big iron propaganda... Both mailbox and maildir have the advantage of being conceptually simple. The database solution is complex, probably more complex than is needed for storing email in the first place.
    Arguing for a database looks to me quite similar to the arguments as to why the Windows registry is better than .INI files. With the same problems, it's an "all eggs in one basket approach" and difficult to deal with when things go wrong.
    Using mailbox means that a problem with John's mailbox probably won't afffect Jane's. Using maildir means that a problem with one of John's messages probably won't affect the rest of them. Using some kind of DB could easily mean, John has a problem with mail, everyone has the same problem.
  • Coz if you put everything in one or two big proprietary boxes people can charge you a lot for consultancy and support.

    As well as these boxes being expensive, since they need to be reliable and hot swappable redundant. Not that even that will help when something external such as a router, cable, etc fails.

    Whereas if you distribute the mail load to 10x the number of boxes (albeit cheap of the shelf boxes), you just need maybe one or two decent (backup/redundancy ;) ) Unix admins.

    Unfortunatly RAIC or RAIB dosn't quite have the ring of RAID.

    With mail the load can be distributed. In fact I believe people don't really mind having their email addresses being user@tag.domain.com. It's the marketing/PR guys who'd complain. Heck market it as user@neighbourhood.domain.com

    Assuming the distribution needs to be that obvious in the first place...
  • We had issues with databases becoming corrupt (and hey, 150000 users like it when they lose all their mail), the database being overly bogged down (guess what, fopen is faster than going through a database) amongst other things.

    As opposed to one user out of 150,000 losing their mail with mailbox or one user out of 150,000 losing some of their mail with maildir.

    While granted I'm sure that bugs such as these can be worked around

    What's the point of applying a work around to get a complex system to work when you could simply apply a KISS aproach?
  • By keeping all the data in one file, mbox should use disk space more efficiently as there will be no overhead for directory or inode entries.

    This might be an issue when storing things as /var/mail/\ but qmail defaults to storing mail in either mbox or maildir format in the home directory. If there are issues lack of free inodes then more than just mail will have problems...
  • I have experience working for a company that hosted millions of users with a maildir format. There are some problems with it. First, some filesystems are just not built for having zillions of inodes and tiny files.

    So how's your newsserver holding up...

    WAFL, used by Network Appliance, can fail under this sort of load. Secondly, maildir file names can be quite long. There was a bug in a version of Solaris where the operating system would not cache file contents of an NFS-mounted file whose name was longer than 31 characters. This can result in very poor performance.

    Sounds more like these are more problems with your specific platform (Solaris) though. Indeed you identify the NFS issue specifically as a "bug".
  • Maildir format:
    cons: limited user access to directory


    No more limited than any other directory, it's just a directory with files in. If people really want it's trivial to read their email with a text editor.
    The "limited access" comes with lack of software (especially GUI software) which can handle maildir. Even though it's probably simpler in terms of programmer efort than formats more commonly supported.
  • I believe mbx format is better than both maildir and mbox. I think part of it is in binary, which makes it faster.

    It may make it faster but it also means that you can easily be tied to specific hardware/software combinations in order to be able to read your email.
  • Sorry to tell you, but this doesn't work.
    (storing email in a database that is)

    I worked as lead programmer at a mail provider and was in charge of the system's design from the start. The ingenius idea to store email in a database, while it sounds good...is rather horrific. We had issues with databases becoming corrupt (and hey, 150000 users like it when they lose all their mail), the database being overly bogged down (guess what, fopen is faster than going through a database) amongst other things.

    While granted I'm sure that bugs such as these can be worked around, databases were meant for holding fields of data, not whole files - especially binary ones (and before you say that email is ascii, thing other languages where they use multibyte encoding etc.)
  • this is basically X.400, upon which OpenMail is based. But instead of having a JDBC/ODBC/Whatever link to a relational database, the architecture is to have a mail-specific "mail store" which stores the mail in SOME way, and then client tools which just talk to the mail store (and client tools in this case are things like an IMAP daemon). It's basically this model, assuming that you are dealing with your mail store as a Mail-specific interface rather than raw SQL.
  • Wasn't there a recent article on a MySQL file system? Wouldn't this be the best of both worlds?
  • If you're using POP, typically the MUA downloads and removes all mail from the server. If that's the case with you folks, then you're taking advantage of the client's CPU and storage, rather than the server. That's swell if you already have heavy-duty clients with good backups and your people don't move around much. IMAP makes more sense if you need a central message store, though.

    Both are reasonable choices, but it's unfair to compare them in an apples-to-oranges fashion.
  • I'd agree that "prickly" is a good word to describe Bernstein; a while back I wrote a chapter of a book on email and while I was writing it I had nightmares about him reaming me for a minor error. Luckily, the book seems to have escaped his notice.

    Having used qmail for a few years, I can indeed say that it is a safe and reliable product. But I wouldn't recommend it for a novice sysadmin; DJB is a really smart guy, and he seems to have little patience for those who aren't.

    As to his views on licensing, here is the distribution policy for his software. He strictly forbids distribution of qmail except in forms approved by him:

    http://cr.yp.to/qmail/dist.html [cr.yp.to]
  • Exchange defrags itself online, but deleted items become white space. The only way to remove that white space is by an offline defrag. But if you have your act together from the start (planning, esp. mailbox limits), tons of whitespace shouldn't be an issue.

    Stop spreading FUD.

    ostiguy
  • Your statement was true until about 12 months ago.

    MS is going to enterprise style per CPU licenses with its xxxxx 2000 products. Exchange 2k will be pushed heavily at ISPs and especially ASPs. MS will expect companies to scale highly, with x,000's of users per box.

    ostiguy
  • I can think of two advantages right off:

    1) If your MAILBOX file gets trashed, you're out your entire e-mail directory. If one MAILDIR file gets messed up, you've only lost one e-mail.

    2) If you get a messed up e-mail that you can't read in a mail program (and this DOES happen) you only have to delete the corresponding message in the /Maildir/ instead of messing around with the MAILBOX file.

    I know that UofW claims Maildir take a performance hit, but I've not noticed one. There's all sorts of web resources on tweaking UofW to pump out e-mails faster. I'm currently using Qmail + IMAP-2000 (with the Maildir patch on Qmail's site) on a P100 w/ 32Mb of RAM and I've got it pumping out IMAP as fast as my work's commercial server does.
  • A few points:

    Tune your file system for what its used for. Your /home directories (where the mail will be stored by default) should be set to have a relatively large number of inodes because of a tendancy toward small files in there.

    Read the docs on updatedb -- set the execlusions to include "/home/*/Maildir" if you wish.

    Maildir also allows for multiple processes accessing a 'mailbox' because it uses per-file locking on per-message files, not a lock on an entire mbox itself. This allows for situations where 6 people all have the same IMAP shared folders for shared incoming mail (like an accounting office, or tech support) without locking problems for the MUA or IMAP server.

  • Use rgrep or GNU grep's -r option to do a recursive search:

    grep -ri "slashdot" ~/Maildir/*
  • I think UW recommends the "mbx" format for most situations - fast, safe in concurrent-access situations, etc. Clearly unlike either UNIX mbox or Qmail maildir.

    Does someone want to explain how mbox is better for concurrent access than Maildir? If you do some good coding, they're equal. For Maildir though, you just do read locks on individual files in your Maildir when opening them to present them to the user, and you create new files to write new messages, which doesn't have any effect on (eg 25) other processes accessing that Maildir.

  • Take a look at Courier-IMAP. It handles Maildir quite efficiently.
  • mbox is only more efficient in terms of disk space if your file system works that way. If you use ReiserFS, you'd probably get better efficiency out of Maildir since its tuned for lots of small files.

    On the DJB note, Dan and I have gotten into our flames on his lists, but some of his software ideas are still very good. The fact that he basically doesn't care what anyone else thinks most of the time seems to me to be why he's succeeded in just writing software that goes against the status quo from the ground up. Anyone else would've crumbled at the criticism.
  • Even with ReiserFS, there is some overhead of space and time involved in managing files. Each file has a directory entry and an inode or two; while ReiserFS may unambiguously improve the time efficiency, that does not result in the space overhead falling to zero.

    This is very true, although I was more concerned with time than space at the current price of a few hundred GB of disk space.

    If he tried to find some places for agreement, his software would probably get used more.

    This, and providing "--help" options to his programs I suggested as being helpful ... right before the deluge of hate-mail ...

    He thinks about the notion of proving correct behaviour

    He never did reply to my philosophical statement that his famous statement, "profile, don't speculate" was incorrect since speculation is scientifically required for eventual proofs to happen.

    Qmail and Dnscache are still personal favorite pieces of software for servers, although there are many things they could do much better than they do. Luckily, Dan seems to attract a large number of patch-writers and individuals who kindly host useful websites like qmail.org [qmail.org] and djbdns.org [djbdns.org].

  • You know Russ, if I could've remembered your name at the time of posting (yeah, yeah, I could've loaded the sites), I would've credited you.

    See you on the lists ...
  • I don't believe _he_ has changed his views on the GPL -- but many of the people writing 'djbware' compatible software use the GPL or BSD licenses.
  • What he may not realise is that distributiors and VARs are the ones who (usually) get the first calls from their clients. If those people (like RedHat) can modify the software (a la GPL) to behave like the rest of their software, they'll find it easier to support themselves. RedHat distributes a version of the Linux kernel with several patches added so that their customers will be happy (based on their own presuppositions). Does the kernel mailing list still get questions from those users? Of course. Does it take much to tell them to contact their distributor instead? No.
  • Actually, I believe this is one of the things that ReiserFS excels at.

    I have very limited experience with Reiser myself, so perhaps someone else can provide more details, but as I understand it ReiserFS is capable of dealing with thousands of small files extremely efficiently (Through the use of tree structures to hold the filesystem). From what I've read, it would be a fairly ideal file system for things like maildir storage.

    In fact, now that the 2.4.1 kernel is out, with included stable ReiserFS support, I might just give this a shot. ;-)

    -- Toph

  • by thule ( 9041 ) on Tuesday January 30, 2001 @09:39AM (#469982) Homepage
    Cyrus http://asg.web.cmu.edu/cyrus/ [cmu.edu] seems to use a hybrid approach. Messages are stored in individual files, but the envelope information is stored in dbm format. So opening up a mailbox and listing messages is very fast. So is searching unless you want to do a full body search on all emails. Give it a try. It supports IMAP, POP, and LMTP.
  • by Outland Traveller ( 12138 ) on Tuesday January 30, 2001 @09:32AM (#469983)
    If you look under the hood of the UW Imap server, you will see that it supports many more formats than straight mbox. I don't think that maildir is one of them, unfortunately, but there are a few (mbx comes to mind) that overcome some of the more blantant shortcomings of mbox.

    Is UW Imap free software? If so, someone should feel free to give it maildir, db, sql, or other mailbox support. For some reason I seem to remember that IWImap was not free software, even though the source is available (some weird academic license hostile to commercial use?). The author is a good programmer and active in the standards process, but can be abrasive to work with.
  • by bkeeler ( 29897 ) on Tuesday January 30, 2001 @04:01PM (#469984)
    Your commandline does not solve the problem that the original invocation of xargs was intended to solve - passing a *huge* number of files to grep on the commandline (grep * in a directory with a ton of files) causes it to break.
    Yes, it does solve that problem. xargs knows the system-specific limit on how long a command line can be, and will invoke the given command multiple times if necessary.

    Thus

    $ /bin/ls | xargs grep "foo"
    might end up invoking, if you have thosands of files, something like
    $ grep "foo" 1 2 3 ... 467 468
    $ grep "foo" 469 470 ... 876 877
    and so on. Using the -i flag to xargs just means it has to create a seperate process for each grep, taking a lot of extra time.

    --

  • by iceT ( 68610 ) on Tuesday January 30, 2001 @10:04AM (#469985)
    Ok. First off, Outlook is the client. Not the mail server. The mail server is called Exchange. Try not to mix the two. I can use Outlook against MANY back ends, including HP's Openmail, (almost) any IMAP/POP3 server, or no backend at all.

    Second, you site three 'benefits' to Exchange:

    Fast: Define fast. The Exchange/Outlook RPC is great over a 100MB network, but try it over a dial-up line, or some line with a high latency. They performance goes right now the crapper, because the protocol is very 'chatty'. The client and server communicate back and for repeatedly to get a task done. IMAP/POP3 are infinately better in adverse environments, because their protocol is 'batch' oriented. A couple of commands, and you have data streaming to the client. Another example is over that same high-latency connection, try forwarding a message with an attachment. The attachment has to be uploaded to the server before you can COMPOSE YOUR MESSAGE. On the server side alone, every internet message has to be 'decoded' into MAPI body parts for storage in the database. If it pukes on a body-part, it'll crash your information store. the IMAP servers do/can parse the messages based on MIME body parts, but that is only when necessary. Exchange parses EVERY internet message, and at a lower level that the MIME body parts.

    Second, you site 'scalability'. I ran a 7000 mailbox UofW POP3 server on a dual 166Mhz Solaris box with 256MB of RAM. The concurancy was about 25%, and the server ran with a load-average of about 1.2. My previous employer is having trouble running 2500 users on a quad PII-450 with 1GB of RAM at a 50% concurency. How is that scalability?

    Third, you mention 'workgroup features'. True, Exchange includes a fairly decent calendar service, this discussion is about e-mail. If you want to talk about workgroup functions, we can do that... (btw, voting is a client function, as it the task management. There is no true 'workflow' in that because there is no central process tracking the work. It's all source-routing/message updates.)

    You also said that Qmail is technically correct, but it's not going to do my company's productivity any good. This may be true. But talk to me when your company starts to interact with OTHER companies, and tell me how well Exchange does. Internet software is designed for interoperability, and when you're dealing with other companies, THAT'S what will make your company productive.

    As for security, I'll leave that to the rest of these guys. I already like the comment about the 5 days w/out mail due to the I Love You virus.
  • by KMitchell ( 223623 ) on Tuesday January 30, 2001 @09:24AM (#469986)
    I ran into the "sync" mail issue a while back and came up with the following criteria:

    1) I want to be able to read mail both from a GUI-based mail prog (Outlook, Eudora, Netscape, whatever) **AND** from a shell

    2) I want to be able to access live and "older" mail anytime from (at least) home and work, preferably both my home and work email accounts.

    3) I do not want to send any cleartext passwords

    What I came up with is the following:

    At home I run the UW-IMAP server, and store my incoming mail in MH folders. Stunnel does a fine job of adding SSL support to IMAP.

    At work we run Netscape's Mail server which actively supports SIMAP.

    Either at home or at work, both servers (and all the mail in all the folders) are available.

    Just about the only thing missing is the ability to read my work mail from a shell, but that's where most of the big ugly attachments are, anyway...
  • by OlympicSponsor ( 236309 ) on Tuesday January 30, 2001 @10:18AM (#469987)
    "...if I had to resubscribe every time I use a new client."

    You have to type your password into the new client--maybe we should store that on the server too?

    "What if there was no last session for the client?"

    Then everything is RECENT. I realize this loses you a feature, namely that you can't see only those messages in client B that you didn't see in client A. But you don't have that feature now. Why not? Because there is a race condition in the spec: if a message comes in AFTER the last time you check your mail (in client A) but BEFORE you logout (with client A) that message won't be RECENT in client B.
    --
    MailOne [openone.com]
  • by ewhac ( 5844 ) on Tuesday January 30, 2001 @12:33PM (#469988) Homepage Journal

    I've been using 'mbox' for -- gawd, can I say this? -- fifteen years, and it's served me well. 'mbox's advantages for me are that it is efficient with disk space (you don't eat an inode per message), and that it is quick to search.

    9 times out of 10, when I'm searching my mail, typically with 'grep', I'm looking for something in the body, not the headers. With 'maildir', you have to open each message and search it. This is preposterously slow. There is also the danger that the shell's wildcard expansion limits may be exceeded if you have a lot of messages. With 'mbox', 'grep' opens the one file and slurps through it quickly.

    Remote synchronization is not an issue for me. All my email resides on my laptop, which follows me everywhere.

    However, I'm hip to 'maildir's increased reliability. I have over 2000 messages in my outgoing box alone, and I'd hate to have a system hiccup destroy any of it. If I could search the bodies of a 'maildir' spool as quickly as an 'mbox' spool, I could be convinced to switch.

    Schwab

  • by MSG ( 12810 ) on Tuesday January 30, 2001 @10:52AM (#469989)
    Originally, the reason we switched to maildir was that even without NFS, mbox was corrupting our filesystems. Not just the files, mind you, but the filesystems themselves. It was a total pain in the ass, and we damn near left Linux for FreeBSD. This was using 2.0.36 and Sendmail. We had to put /var/spool/mail on it's own partition so we could unmount and fsck it until we found a solution. Between that and problems with files > 500MB, my opinion of Linux 2.0 is very bad.

    Our solution was moving to qmail and using Maildir mailboxes for our users. We never saw the problem again. :)

    Recently, I've switched to courier mail server (http://www.courier-mta.org/) on all my non-production machines to evaluate it. I'm really, really happy with it. Courier is a complete mail system, not just an IMAP server, so you might take a look at the whole package. The whole thing is RFC compliant, which causes troublte for software that isn't, but that's a fault in the other software.

    As a final rant against UW-IMAP: I hate it. It loads the whole damn mailbox being checked into memory (regardless of the type), which creates a huge load every time someone with a large mailbox checks their mail. This problem affects the POP3 server as well, since that also uses the c-client code.
  • by scm ( 21828 ) <scm@despamme d . c om> on Tuesday January 30, 2001 @09:14AM (#469990) Homepage
    "Qmail only supports maildir..."

    That's just plain wrong. Qmail supports both maildir and mbox. I've been using qmail with only mbox files for years...

  • by iceT ( 68610 ) on Tuesday January 30, 2001 @09:30AM (#469991)
    And at only $87/user client access license (courtesy of Shopper.com [shopper.com]), it's a STEAL...

    (oh, plus Win2000)...

    (oh, plus a machine with at LEAST 256-512MB RAM)...

    (oh, plus a backup solution to backup the DB live)...

    (oh, plus some sort of a firewall/gateway... you wouldn't want this DIRECTLY on the 'NET..!)

  • by FattMattP ( 86246 ) on Tuesday January 30, 2001 @11:34AM (#469992) Homepage
    ArsDigita has a great article [arsdigita.com] on using Oracle as a backend for your mail and ACS as a front end.
  • by Wraithlyn ( 133796 ) on Tuesday January 30, 2001 @09:41AM (#469993)
    Unfortunately, "analog snailmail boxes" are highly susceptible to quite a few undesirable things:
    • High latency
    • Address spoofing
    • Packet flooding (AOL CDs)
    • Denial of Service attacks (rednecks driving by in pickups with baseball bats)
  • by Rudeboy777 ( 214749 ) on Tuesday January 30, 2001 @09:04AM (#469994)
    My mailbox works just fine, and it hasn't changed in over 20 years! It sits at shoulder height just to the right of my front door. Here's the advantages:

    -No encryption techniques neccesary

    -rarely have to waste time with forwarded jokes

    -Best of all, the spam it collects is occasionally useful (I know all the pizza deals available in town).
  • by OlympicSponsor ( 236309 ) on Tuesday January 30, 2001 @09:09AM (#469995)
    As someone who is, as we speak, supposed to be implementing an IMAP server, let me say this: If the person who dreamed up RFC2060 says that X is "slow and dangerous" run, DO NOT WALK, to leap onto the X bandwagon--it'll be the wave of the future.
    --
    MailOne [openone.com]
  • by anewsome ( 58 ) on Tuesday January 30, 2001 @09:02AM (#469996)
    I think the guys who wrote Cyrus IMAP server got it right. I have been using Cyrus for about 4 years now and I rarely delete mail. The server is still responsive and full body text searches are pretty speedy, even on the P133 server that it is running on. I think keeping each mail in a seperate file, and making a directory for each folder is the way to go. It also makes it very simple to restore a lost mail message and to index the whole mailbox. Anyway,.. thats my two cents.
  • by Chaostrophy ( 925 ) <ronaldpottol&gmail,com> on Tuesday January 30, 2001 @09:12AM (#469997) Homepage Journal
    http://www.jwz.org/doc/
    has a number of essays about mail on Unix systems, including problems with mail box formats.

    I use Xemacs/Gnus/nnml so all my mail is stored as individual files, which is handy (as other posters have said) and has it's downsides, as they have said too (grep now bitches if passed all files in my main mail box). Still, I like it, best system I've used. Not so great for the multiple hosts thing though.

    Or you could run your mail and xemacs on one machine, and either read your mail in a terminal, or open X windows on your local display. Look up gnuserve to do that, I think.

  • Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.

    Fortunately, a solution to this problem is being developed right now. The Citadel/UX project [citadel.org] is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange. SMTP and POP3 are already in place; IMAP will be available by the end of the year. Web-based access works as well. After that's done we'll be writing plug-ins for both Evolution and Outlook, in order to facilitate all of the 'shiny things' working as well: calendars, address books, etc.

    So, you might ask, what mailbox format does it use? None of the above. Messages are stored in a database, like they should be. The Berkeley DB [sleepycat.com] package from Sleepycat Software (yes, it's open source) is used for robust back-end storage, including transaction and logging support.

    I'd encourage any developers who are looking for the open source world's "Exchange Killer" to get involved in this project.
    --
  • by ajs ( 35943 ) <{ajs} {at} {ajs.com}> on Tuesday January 30, 2001 @09:41AM (#469999) Homepage Journal
    Email messages are a specifically interesting topic. They're (for the most part) text, and tend to be larger than database fields want to be (on the order of 1+ kB each ranging all the way up to many megabytes in common practice).

    This makes most mail messages poor choices for database storage (for example you want to be able to use "grep" on mail or compress in-place. Headers on the other hand are a major win in a database ("select messageid from headers where user = 'me' and date > yesterday and fromaddr = 'taco@slashdot.org'" should be fast even if I have tens of thousands of messages).

    The easy solution is to keep the headers in the database, and then just keep maildirs with the original messages in the normal filesystem with the filenames in the database with the headers (something like message.headerid => headers.id and message.text is a path to the maildir entry for this message.

    This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.
  • by benploni ( 125649 ) on Tuesday January 30, 2001 @09:03AM (#470000) Journal
    Maildir is better because:
    1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.

    2) maildir is more resistant to catastrophic corruption since each email is a seperate file.

    3) maildir keeps metadata about the email in the emails filename, rather than a seperate index file. This helps prevent the metadata, such as "replied-to" and "forwarded this" from getting out of sync

    4) filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.

    5) maildir is better positioned to take advantage of advanced new filesystems like reiserfs. when reiserfs has a plugin for file-level transparent compression, maildir will be able to selectivle and invisibly compess emails to the disk without requiring other programs/scripts to decompress them before use.

    Study maildir, it's just plain better.

For God's sake, stop researching for a while and begin to think!

Working...