Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Communications Software

Infrastructure for One Million Email Accounts? 1216

cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"
This discussion has been archived. No new comments can be posted.

Infrastructure for One Million Email Accounts?

Comments Filter:
  • cyrus (Score:1, Interesting)

    by Anonymous Coward on Thursday September 08, 2005 @07:44PM (#13514129)
    i believe that cyrus imap was designed specifically for large scalable systems. it can scale to multiple servers and uses a database for hashing the email... (afaik)
  • For starters... (Score:3, Interesting)

    by cached ( 801963 ) on Thursday September 08, 2005 @07:45PM (#13514136)
    For starters, uptime should usually be higher than 99.9% for this large a site. 99.9% uptime means 40-45 minutes of downtime a month. Try going for 99.99% at least, though this usually increases the cost by about 250% according to what I have seen a few years back.
  • earthlink's setup (Score:2, Interesting)

    by Triumph The Insult C ( 586706 ) on Thursday September 08, 2005 @07:47PM (#13514165) Homepage Journal
    earthlink's mail server complex has come up on freebsd-isp a few times

    this guy [jetcafe.org] used to work at both sendmail and earthlink and he has links to some good resources
  • Vendors (Score:5, Interesting)

    by XorNand ( 517466 ) * on Thursday September 08, 2005 @07:48PM (#13514170)
    I'd start with talking to vendors. Consult with some sendmail gurus, Notes guys, etc. Any of these people/companies would salvate at the thought of being a part of a project this large. First, talk to the client and hammer out the real needs with solid performance requirements, timeframes, growth expectations, (meaning real numbers) etc. Put together a well thought-out Request For Proposal and send them out to as many applicable vendors that interest you. Then just stand back and play the role of ringmaster. The vendors will give you all the ideas you need.

    Just do one thing, please: make sure that the client is honest-to-goodness serious about this. I absolutely hate getting pie-in-the-sky RFPs from people who are just kicking the tires. It's a good way to burn bridges by not looking professional.
  • New Google Appliance (Score:3, Interesting)

    by Anonymous Coward on Thursday September 08, 2005 @07:49PM (#13514189)
    I agree. The google appliance should implement gmail and a web front end for administration. Like the Colbalt machines of yore, only better. Google-ified.

    It really is the best email.
  • by DavidDPD ( 885638 ) on Thursday September 08, 2005 @07:54PM (#13514234) Journal
    I'm not sure that there is any commerical solution that can support 1 million emails well. Hence why Yahoo and Google have built there own custom systems. Some engineering may need to be required.

    For pop3 & imap4rev1, look at:
    http://www.dbmail.org/index.php?page=overview [dbmail.org]

    Still need an MTA, I think qmail is the fastest, best, but I'd used exim, as its easier.

    Database - not sure if MySQL and PostgreSQL will scale with dbmail.

    I'd say use FreeBSD, because of the ports collection (Don't linux Flame me). However, something like Solaris 10 x86 (or Solaris+Sun Hardware) might provide a bit better scaling, and HA hardware, SAN support, support in general, etc. Though, a bit tougher on the OSS software installs (In My Experience)

  • by Lost+Found ( 844289 ) on Thursday September 08, 2005 @08:08PM (#13514360)
    qmail-ldap is best suited to this task. Reasons:

    1. You can sleep at night knowing that you're running the only MTA in widespread deployment that has never once had its security compromised; in fact, qmail's author Dan Bernstein still offers cash to the first one to be successful...

    2. You can sleep at night knowing that the core MTA, qmail, has reliably handled some of the largest e-mail operations in the history of the internet. Its design is such that on a properly configured system, you'll never lose a single e-mail. Hotmail actually used qmail for a long time, even after Microsoft bought them - Microsoft repeatedly tried to replace it with Exchange, which kept buckling under the load.

    3. Qmail is very modular, allowing you to pick and choose your components wisely.

    4. Qmail uses the Maildir format its author pioneered. Maildir is NFS safe, not proprietary/complicated (often binary formats like PST are subject to corruption), etc.

    5. LDAP makes it easy to manage massive amounts of accounts.

    In any case... qmail-ldap is already running large sites with millions of users. Info:

    http://www.qmail-ldap.org/wiki/Documentation [qmail-ldap.org]

    I've set one of these systems up on an IT cluster at my current office, and I must say that it is not only very robust but also really easy to manage.
  • Re:POP? (Score:5, Interesting)

    by mre5565 ( 305546 ) on Thursday September 08, 2005 @08:10PM (#13514371)
    A million users and they want POP3? Add a gun and a single bullet to your administration requirements.
    No doubt a well deseved +5 for humor, but for those of us less in the know (and a chance at another +5 for informative), what is so bad about POP3? Thx.
  • Re:Obviously (Score:5, Interesting)

    by kryonD ( 163018 ) on Thursday September 08, 2005 @08:11PM (#13514379) Homepage Journal
    Or maybe this is a legitimate cry for help from EDS who duped the US Navy into thinking they could actually outsource IT on the exact scale that the poster is talking about. Mind you, no one has ever provided ubiquitous support for an organization as large as the Department of the Navy, but they somehow convinced congress that they could do it for $6B dollars.

    Just so you know. Most of us out in South East Asia refer to NMCI (Navy-Marine Corps Intranet) as the Not Mission Capable Intranet.
  • by UndeadDude ( 155075 ) on Thursday September 08, 2005 @08:12PM (#13514389)
    Having dealt with sendmail at scale, I would definitely say no. And if you think that it is the most configurable, sounds like there are some MTAs you still need to check out. I recommend Exim.

    I agree that you want to split things up-- make farms of large numbers of servers to make horizontal scaling easy. Store your user info in LDAP (OpenLDAP works very well, with very good data replication in 2.3.x). Most common server software will support LDAP and it scales very well.

    You need "layer-4 switching" to load balance across machines, and automatically disable systems/services that are down. You need something that will cluster. I recommend Foundry ServerIron switches. F5 BigIP is another common alternative.
  • Re:Obviously (Score:5, Interesting)

    by EnderWiggnz ( 39214 ) on Thursday September 08, 2005 @08:19PM (#13514436)
    WalMart runs the worlds biggest Exchange install. They and msft are quite proud of it, actually...

    The Navy maywant to take a page out of walmarts book, if they're having that much trouble.
  • Re:NO GMAIL (Score:5, Interesting)

    by Alan Hicks ( 660661 ) on Thursday September 08, 2005 @08:36PM (#13514568) Homepage
    I would have to say use Qmail

    My God no! Friends don't let friends use qmail. Want reasons why?

    1) It's a bitch to install. Won't even compile on modern Linux distributions. You have to patch it to compile it and the patch isn't even hosted on qmail's site.
    2) It's a bitch to configure. Rather than parsing a single configuration file, qmail relies heavily on the presence of individual files in a directory.
    3) Not not not not scalable! That's a myth. Doesn't properly batch jobs together. Hell! qmail was originally designed to be run from inetd!
    4) Heavy reliance on other daemontools.
    5) Breaks well-known and understood UNIX standards.
    6) Security through lack-of-functionality.
    7) Not really secure despite the claims.
    8) No longer maintained.
    9) No features. Adding them requires patching, and patching, and more patching.

    Serious sysadmins don't use qmail and for damn good reason. I don't give a damn if Yahoo did manage to string it together and make it work well. In short, qmail isn't particularly suited for deployment in any capacity.

  • Qmail!! (Score:1, Interesting)

    by mnmn ( 145599 ) on Thursday September 08, 2005 @08:58PM (#13514691) Homepage
    Qmail is best. Preferably on a FreeBSD server. So hard to kill it in any way.

    Get a server with RAIDed SCSI disks preferably hot-pluggable. Install FreeBSD, Qmail and other packages you might need as you go.

    Ideally keep the emails in a Maildir format.

    I dont know where the Novell idea came from.
  • More specific? (Score:3, Interesting)

    by Grendel Drago ( 41496 ) on Thursday September 08, 2005 @09:00PM (#13514706) Homepage
    Could you be a bit more specific on the following items?

    5) Breaks well-known and understood UNIX standards.

    Which standards are these? Are you talking about the errno [tesco.net] fiasco?

    6) Security through lack-of-functionality.

    What sort of functionality is provided by, say, postfix, that qmail simply won't do?

    7) Not really secure despite the claims.

    How's that? Do you have $500 [cr.yp.to]? If not, what's the security vulnerability that the author refuses to acknowledge?

    Which of these problems that you enumerate are not addressed by netqmail [qmail.org]?

    --grendel drago
  • by Anonymous Coward on Thursday September 08, 2005 @09:16PM (#13514789)
    Have you found courier imap scaleable? I have found when a user gets 1000+ messages in a box doing certain imap queries (select, sort) become very slow. This is because courier can't index any data and has to open each file on these queries. This was a real deal breaker for us, would be interested if you found away around this (without forcing users to divide their folders).

  • by Matt Perry ( 793115 ) <perry.matt54@ya[ ].com ['hoo' in gap]> on Thursday September 08, 2005 @09:18PM (#13514802)
    split the domains up with a 2 level deep hashing algorithm
    Could you please elaborate on this point and why you do it?
  • Re:Obviously (Score:5, Interesting)

    by jrockway ( 229604 ) * <jon-nospam@jrock.us> on Thursday September 08, 2005 @09:25PM (#13514841) Homepage Journal
    > you could do the entire thing with MySQL if you REALLY wanted to

    I am so tired of people shoving everything into relational databases. What queries are you going to run against your database, anyway? SELECT * FROM messages WHERE read=0? Try "ls new" in your maildir. The reason things never scale right is because people design things to be "new" and "cool" like putting their e-mail into a relational database. No. Just use the filesystem. It, and its supporting tools, have been around for 30 years! It Just Works! It doesn't use any userspace memory! There are no permissions issues, because the kernel controls the permissions. It's the optimal solution.

    The filesystem is really really efficient (for e-mail) and really really reliable.

    Please, don't use a database!
  • Re:Obviously (Score:5, Interesting)

    by superpulpsicle ( 533373 ) on Thursday September 08, 2005 @09:26PM (#13514847)
    The Walmart exchange site was not properly backed up for "years". Mostly because Exchange was not 3rd party software friendly at all, and M$ didn't have much of their own backup software to offer. Veritas and Legato couldn't bend over enough for a million users.

    Walmart invited countless consulting firms and data backup experts. They deployed Exchange strictly because M$ was willing to "support" them. To say they were vulnerable to a major IT disaster was an understatement. The Navy want nothing to do with Walmart's IT.

  • Re:NO GMAIL (Score:5, Interesting)

    by Denis Lemire ( 27713 ) on Thursday September 08, 2005 @09:34PM (#13514899) Homepage
    Definately agree on point 9. I maintain a mail server of over 2,000 users. Currently running Qmail with the following patches:

    chkuser-2.0.8b-release.tar.gz
    doublebounce-trim.patch
    netqmail-1.05-tls-20050329.patch
    outgoingip.patch
    qmail-smtpd-auth-0.31.tar.gz
    qmail-smtpd-auth-close3.patch
    qmail-smtpd_gmfcheck.patch
    qmail-spf-rc5.patch

    Most of these patches require hand editing the sources and Makefiles to successfuly merge them all into the stock qmail or netqmail base. Lots of manually reading through *.rej files to make it all work.

    In order to simplify new installations I've created my own personal CVS repository for my Qmail sources. I commit changes to the tree whenever a new patch comes out with functionality I need. Hence on a new install I simply check out my custom tree and compile.

    The initial work was a royal pain in the ass, however, once it is all up and running the stability and performance has been excellent.
  • Re:Obviously (Score:2, Interesting)

    by AKAImBatman ( 238306 ) * <akaimbatman AT gmail DOT com> on Thursday September 08, 2005 @09:50PM (#13514983) Homepage Journal
    And a database file system [blogspot.com] would give you the best of both worlds. One message per file, yet the ability to quickly query for messages, and organize with a label system similar to GMail.
  • by Anonymous Coward on Thursday September 08, 2005 @09:54PM (#13515003)
    (This is not a troll, all the following questions are honest.).

    > OpenLDAP

    IIRC, the replication feature was pretty buggy in some versions of OpenLDAP (2.2.x). Has it been really fixed in the latest versions ?

    > Exim

    What about qmail ? Have you ever tried it ?

    > MD4 [is] more balanced than MD5.

    Do you have evidence to back up this claim ?

    > NFS mount the maildirs from a fast NFS device like a Netapp.

    How do you provide data redundancy with such devices ? Do you replicate data on different NFS servers ? Why not use FreeBSD or Linux boxes as NFS servers ?

    > Hardware load balancers are pretty much a necessity.

    Why not use standard software load-balancing facilities provided by Linux and BSD systems ?
  • Re:Obviously (Score:3, Interesting)

    by cecil_turtle ( 820519 ) on Thursday September 08, 2005 @10:17PM (#13515123)
    I don't know if you actually have experience running a mail server or not or if you just wanted to go off on your relational db rant, but mail data tends to be created and deleted A LOT with varying size files, and file-based structures on a mail server create serious fragmentation problems. If you do decide to go this way, allow plenty of free drive space - well above normal recommendations - like 80% free or more.

    Also many people have their mail clients set with ridiculousy frequent mail check times (like every minute), and on a file based system each check requires a trip to the drive and back. Even with the data on a RAID array with a decent read/write cache, you're still going through the disk subsystem, whereas with a database it would all be in memory.

    What's wrong with SELECT * FROM messages WHERE userid=xyz and read=0? That is a cakewalk for a properly indexed dbms. On a medium sized server (say, quad processor w/ 8-16GB RAM) there is more userspace memory than os memory space.
  • by Zak3056 ( 69287 ) * on Thursday September 08, 2005 @10:21PM (#13515152) Journal
    OpenLDAP

    You need a central configuration repository to store the email accounts, their passwords, etc. OpenLDAP is perfect for this, and you can replicate it out for scalability. Be prepared to learn about LDAP schemas.


    I know this won't be a popular opinion, but given that he's migrating from Exchange, it's fairly likely that they're already an Active Directory shop... it doesn't make sense to abandon it for OpenLDAP, especially given that they're almost certainly windows only on the desktop and will still need AD even if they ditch Exchange.

  • My vote is for Notes (Score:2, Interesting)

    by mferrare ( 65039 ) on Thursday September 08, 2005 @10:27PM (#13515184)
    I'd put my vote in for Notes also. It's architecture should scale to meet your requirements what with distributing you setup across many servers and using replication. Granted the client isn't the best by any means (more on this later) but the application itself is quite good. Your laptop users can replicate their e-mail locally which is a simple procedure. I replicate my notes locally just so I can index my mailbox on my local drive.

    But the real advantage of Notes is as a distributed applications platform. If you want to expand past e-mail and start writing applications such as leave management or room booking or technical documentation databases the this is where Notes really shines. And they're all databases and they can all be replicated so they take advantage of the same redundancy that your e-mail will use. And if you need to travel then you just replicate the databases you want onto your notebook and take them with you. It's fantastic.

    Ah, the mail client
    Why oh why does the client suck SO MUCH!! At my previous company the management were looking at moving to exchange simply because Outlook is so much a better client than what Notes (even R6) is. It's a big fat piece of bloatware (as has been discussed many times here). My main peeve is that if you edit an attachment inside an e-mail you can't save it back into the e-mail! eg: here's a typical scenario:
    Not using Notes (outlook, thunderbird, mail.app all let you do this)

    • Receive e-mail with an attachment
    • dbl-click on the attachment, edit it, save it
    • forward the e-mail, including the saved attachment, to someone else
    Simple huh?
    With Notes:
    • Receive e-mail with an attachment
    • Detach the attachment from the e-mail message. Save it somewhere
    • Use windows explorer (or whatever) to find the attachment, edit it and save it
    • Forward the message
    • before sending, delete the original attachment and replace it with the copy you have saved on your hard drive somewhere
    • send the message
    • delete your copy of the attachment
    Sigh!!!

    WHY!?!?!?!?

    But despite all that crap I still think it's an excellent platform and one you should consider. It has support for encryption and also supports IMAP (although not very well I hear). A lot of large corporations run it. I've worked for 2 large investment banks both of who run it. You can also integrate IM into it (with sametime) and remote meetings also (with sametime meeting). Also, IBM PS are good at setting it up. For something this scale you'll be up for $$$ anyway so I'd be looking at having someone come in to help you and they're pretty good (I don't work for IBM!).

  • by thogard ( 43403 ) on Thursday September 08, 2005 @10:28PM (#13515192) Homepage
    All of these systems will be running sendmail.

    That would be an absolute nightmare. Postfix is just as functional and orders of magnitude easier to administer.


    If its a million seats, its not going to be easy to admin at all. It will require several people that know MTAs inside and out and sendmail has a track record in very large systems.

    Remember that in this case, the job will be 100% running an email system so the best tool for the job should be used, not the best tool for the admin.
  • Re:Obviously (Score:4, Interesting)

    by the real darkskye ( 723822 ) on Thursday September 08, 2005 @10:48PM (#13515300) Homepage
    The mods are on crack, the meta-mods are on pot
  • Re:More specific? (Score:3, Interesting)

    by killjoe ( 766577 ) on Thursday September 08, 2005 @10:58PM (#13515361)
    "What sort of functionality is provided by, say, postfix, that qmail simply won't do?"

    Qmail has almost no features out of the box. It can't talk to LDAP, it can't handle multiple domains, it does not reject mail for unkown users (instead it queques up a bounce message which means each spam message generates one outgoing message).

    in order to get qmail to what exim and postfix do you have to apply half a dozen patches and recompile.

    Of course unless the guy who did the compile took very careful notes you have no idea what your particular installation of qmail is capable of either.

    I inherited a qmail install one time and it was a nightmare to maintain. When somebody decided to start sending me 100 thousand emails a day to unkownuser@mydomain.com and my message que got to be hours long I only had two options.

    1) Gather all the patches used to build the original qmail (again no real way of knowing) and then add yet another patch and recompile.

    2) Install postfix.

    Guess what I did?
  • by dougnet ( 913517 ) on Thursday September 08, 2005 @10:59PM (#13515368)
    I ran an InterMail MX system for about 3 years for a national ISP. The company that sells InterMail was called Software.com at the time... and then they merged with phone.com and the combined entity was renamed Openwave. They provide many of the browsers used on cell phones... check an old phone and it probably says "phone.com" and a newer one will say "openwave". I used version 4.x of their InterMail Mx product primarily and had a little experience with version 5.0. It is a fairly complex system but is obviously very powerful. The system used an Oracle database for all user information (LDAP on the front-end, with the data stored in an Oracle DB on the back-end) and also used an Oracle database for each Message Store server. For example, if an E-Mail message was sent to 2000 users on your system, one instance of the message was saved to disk (in a hashed directory structure) and 2000 "links" were stored in the Oracle DB. Once all 2000 links were deleted (IE all users deleted the message) then a garbage collection process would remove the message file. This can obviously save a lot of space on a busy system. The server scaled by adding Message Store Servers (MSS) and front-end POP/IMAP/Web servers. The front-end servers are typically setup for load-balancing with F5 BigIPs or the like. The back end servers (directory/ldap server, MSS servers) are less redundant and require a cluster/HA solution. We had a 3 to 1 fail-over for our directory server and two MSS servers to one stand-by system. This was at least US $2M of hardware by the time you added an EMC Symmetrix for multiple TB of storage. This was a while ago and you may not need to use a tier 1 storage vendor... but when you're talking 1 million users and 99,9% uptime, you can't just throw something together and cross your fingers. OpenWave also offered an InterMail Kx solution (thousands of users rather than millions of users) that was less complicated. Below that was post.office. The price at the time was negotiable and was generally based on the number of users. Their support was generally quite good. They appear to call the product Email MX now: http://www.openwave.com/us/products/wireline/email _mx/index.htm [openwave.com] The main reason companies choose (or stay with) MS Exchange really comes down to these two things: 1) Integration of the Windows Domain with the E-Mail account (often single sign on). 2) Integrated Calendar I'm not sure if Openwave offers something comparable now with their product, but I'd much rather run a system with that many users on a Unix platform than on a ton of Windoze systems. As other posters have mentioned, if it is properly architected... many different options are possible.
  • by Doktor Memory ( 237313 ) on Thursday September 08, 2005 @11:13PM (#13515487) Journal
    All of these systems will be running sendmail.

    You're high. Building a massive production email system on Sendmail 9 is slow-motion suicide. If the security holes don't get you, the terrible configuration methods and complete lack of scaleability will, nevermind the fact that Sendmail Inc is trying desperately to replace the product.

    "Most managable with [...] heavy customization?" I'd laugh if I wasn't crying. And I'm crying because I used to work for a company that deployed a massively customized sendmail infrastructure -- and I was one of the poor bastards who had to maintain it. Trust me, you don't want to do this. Ever.

    Yes, milter is cool. No, it's not cool enough to justify burning CPU cycles on sendmail in 2005.

    Even Sendmail Inc tacitly admits that Sendmail's design is garbage: take a look at the design document [sendmail.org] for Sendmail X, and note carefully how much it resembles Postfix and Qmail. There are very good reasons for this.
  • by Russ Nelson ( 33911 ) <slashdot@russnelson.com> on Friday September 09, 2005 @12:17AM (#13515893) Homepage
    Yer blowin' smoke, of course. Everybody loves to claim that they've found a vulnerability in djb's code, but when it comes down to details, there are none.
    -russ
  • by Anonymous Coward on Friday September 09, 2005 @12:21AM (#13515911)
    I think your comments are true to a point. I have been very happy feeding mail to 4000+ users on a single server using FFS+softupdates. However, there has been a study on this and although it's one study, it is data:

    http://www.google.com/url?sa=t&ct=res&cd=2&url=htt p%3A//www.usenix.org/events/lisa03/tech/full_paper s/elprin/elprin_html/&ei=rQghQ4zwNLPyYLHzgaIN [google.com]
  • by eh2o ( 471262 ) on Friday September 09, 2005 @12:46AM (#13516067)
    As I said before, all of those things add up to a constant overhead. (but maybe you never took a class on algorithms so you don't know what I'm talking about...)

    In order to say that an RDBMS is an order of magnitude slower, one most show that as load increases the overhead of the DB grows faster than that of a FS doing the same task. (and, generally, to say that this difference is "an order of magnitude" the spread between them should increase at least linearly).

    Doing a trace on a DB for a simple query tells you absolutely nothing about its scalability.
  • Re:Obviously (Score:3, Interesting)

    by Fulcrum of Evil ( 560260 ) on Friday September 09, 2005 @01:08AM (#13516192)

    Exchange/Outlook will let you modify the attachment in place and keep it in your mailbox.

    Are you saying that I can send a file to 100 people, then edit it after I send it and leave the 100 people with no audit trail? That's horrible!

  • Re:Qmail!! (Score:1, Interesting)

    by MyEyesTheyBurn ( 908621 ) on Friday September 09, 2005 @01:47AM (#13516403) Homepage
    I have to agree with QMail, I've seen it scale nicely - but not on one machine. You would really need a large cluster of machines - Perhaps the following:

    - 4-5 core machines all running heartbeat, and DRDBD or NFS
    - Then several Machines for POP, IMAP, and Webmail (NFS the maildirs)
    - Then several SMTP servers.

    Something similar, but greatly scaled, like this: http://shupp.org/maps/ispcluster.html [shupp.org]

  • Re:Obviously (Score:5, Interesting)

    by raynet ( 51803 ) on Friday September 09, 2005 @03:20AM (#13516763) Homepage
    Plan 9 OS has filesystem that does just this. I think it was called Venti. Basicly it hashes the datablocks on the filesystem and only stores each unique block once. There was (is?) project where the filesystem was being ported to Linux.
  • Re:Obviously (Score:3, Interesting)

    by sco08y ( 615665 ) on Friday September 09, 2005 @07:11AM (#13517501)
    I am so tired of people shoving everything into relational databases.

    What relational DBMSs? All I've heard discussed are SQL products.

    The filesystem is really really efficient (for e-mail) and really really reliable.

    I'm tired of everyone shoveling everything into a filesystem.

    How are you going to run queries against your contacts? Or your appointments?

    How does a filesystem guarantee referential integrity? Can a filesystem guarantee an appointment doesn't exist for a bogus contact?

    *Any* kind of integrity? Can a filesystem guarantee that a message is well formed?
  • by chewitt ( 913600 ) on Friday September 09, 2005 @09:05AM (#13518004)
    My 2p, based on experience of designing, managing and being commercially responsible for large scale messaging systems for the last 6-8 years (where large scale covers 500k users to 9m users) is that you don't want to use OSS as the core for projects this size. This may sound somewhat heretical to the /. audience, but if you're serious about the uptime constraints (99.9% is light - 99.999% is where you need to be and 100% is what you should be aiming at) and weighing in that someone's business somewhere is going to heavily depend on the success of this system, you *need* the focussed support and SLA's that you will only get from a commercial vendor. You're still going to glue the system together with a number of open technologies and there will be substantial customisation to meet your needs, but the core of the system needs to be rock-solid. In general my experience has been that much OSS Mail componentry is fantastic at lower scales both technically and commercially, however the admin burden rises unacceptably when the collective sum of all those components needs maintaining - even when in the hands of highly skilled administrators. Mail platforms at these scales constantly have problems/issues in them somewhere due to the unpredicatbility of a million users alone, so one of your biggest concerns is how you overcome them. Being dependent upon the OSS community or internal resources to perform a root cause analysis and fix a code bug when your system is running live is not a situation you can afford to be in.

    Some things to consider: MS Exchange is a lot more than just mail. If Calendaring and other forms of group-working are involved then the task at hand is substantially more complex than for a mail only system. Also, these days with virus and spam being endemic the platform needs to incorporate a framework that handles them as well as policy driven content management controls at it's core rather than have them as bolt-in's or bolt-on's. Are you bound by any regulatory requirements?. Geography is a major influence, and if this is a business platform how does this affect your strategies for resilience, disaster recovery and backup of the platform? In a perverse way most of the decisions you have to make when building systems of this size are about business decisions (what's the cost of retraining users to use new mail clients is a favourite of CTO's) and it's not specifically about the products/technologies involved.

    So, exactly what type of hardware/software and surrounding infrastructure you need to assemble to create 'the whole' is a somewhat open-ended question without going into a decent level of detail on your requirements and the drivers behind them. However, once you go north of about 500k users the number of commercial vendors tails off dramatically. If you include group-working as a factor it reduces further. I'll not start suggesting names (I currently work for a vendor in this space and self-plugging's not in the spirit that /. operates on), but i'd recommend starting out by talking to some of the analyst groups that have staff researching this end of the messaging market (Radicati, Gartner, Butler Group) and then opening dialogue with vendors appropriately.

  • Notes/Domino (Score:4, Interesting)

    by hey! ( 33014 ) on Friday September 09, 2005 @09:35AM (#13518183) Homepage Journal
    Of course nearly everyone who uses it hates it, because it seems unnecessarily complicated. But this is precisely the kind of situation Domino was designed to handle: scaling. If you can get by with Sendmail, you don't need or want Domino, but if you want to manage a million email accounts, this is one of the first places I'd look.

    This is exactly what Notes was designed to do: scale. People have been building systems on this scale with notes for nearly twenty years. You can not only scale it by moving parts of your email system onto mainframe class iron, but you can distribute it and provide all kinds of flexibility and redundancy into your system to meet virtually any messaging requirement (e.g. choose an alternate MTA for high priority traffic when there are Internet disruptions). Naturally there's some complexity involved, but if you can get by with sendmail you probably shouldn't be using Notes.

    What's more important is that management of accounts and identity, which is distributed, delegatable, and backed up by robust cryptographic certificate management. You can let a subsidiary manage it's own accounts, they can subdelegate that to a division and the division can subdelegate that to the IT staff on site; at each level policies can be set, enforced, and changed for lower levels.

  • by diegocgteleline.es ( 653730 ) on Friday September 09, 2005 @10:15AM (#13518471)
    You have a flawed assumption in that the file is read only. Exchange/Outlook will let you modify the attachment in place and keep it in your mailbox.

    ....and then, Exchange WILL have to write a new copy of the data, because you just modified it and the data is not the same than before - you can't use the same copy. If the 1000 users keep the same file it's fine, if they modify it you need 1000 copies about it

    Sharing something with people (which for some reason database people call "single instance store" I've learned today) can be done in both a filesystem and in a data base. Databases are "one-size-fits-all" kind of tools, not always the "best" solution, but one that you've lot of chances of making it work even if it's not the best solution. Linus said something similar when he was suggested to develop GIT in top of MYSQL...if you really know what you're going to do with the data, and you KNOW that a filesystem is enought, why use it? It's buying a 900HP car to your mother - STUPID. The "let's do it just because we can" is a good step if what you want is to write overengineered, bloated software.

    Because a filesystem IS a database. Except that instead of having a SQL-ish interface, you've a "read(), write(), readdir()" kind of interface. Which happens to be really fast (filesystems are implemented inside the kernel, they're reliable, they're much simpler, easy to manage, etc).

    When you use a database like mysql, you're just using a database in top of, uh, another database (the filesystem). Which has not sense. It WILL work, but that doesn't means is the "best possible solution"

    Despite of all this, BTW, hardlinks are NOT the solution for the "share a file between 1000 users" problem. It can be, but remember that you can't make hardlinks between different filesystems. I have no idea if you can use LVM to solve this, if ACLs + symbolic links can be used to implement this in a delivery agent. And if you cant (I don't really know), someone really should think about adding something to filesystems to allow it like plan9 did, because it has sense
  • by JacobKreutzfeld ( 614589 ) on Friday September 09, 2005 @10:40AM (#13518632)
    I used qmail-ldap to build a service which has had zero downtime in over a year, planned or unplanned. I had a handful of 1U servers offering SMTP(S), IMAP(S), POP(S), WebMail, and local DNS and LDAP caches. They stored mail on a backend NetApp accessible to all servers via NFS. One master LDAP server was where accounts were added, and it replicated to the cache slaves on each 1U server. I can add capacity to the NetApp, and add servers to handle load with no downtime. The 1U servers are fronted by a redundant pair of F5 load balancers.

    We were able to apply OS patches box-by-box, taking them out of service individually, but without any downtime to the service. Very nice.

    Others are using qmail-ldap for large ISPs, of the size you are asking about. Check out their mailing list.

"If it ain't broke, don't fix it." - Bert Lantz

Working...