A Programmatically Accessible Email Archive? 61
JohnnyConatus asks: "Does anyone know of a service that offers corporate email archiving and also provides a read-only interface for accessing the archived emails programmatically? Ideally this would be in the form of an database connection or a web service. My current employer is required by the SEC to archive all email communication with clients and we would like to incorporate the archived emails into our internal applications. I have called just about every email archive service I could find via Google, and while most offer a web application to search the emails, none so far have a solution for doing so programmatically. For various reasons, archiving the emails ourselves is considered the last resort. If we had to implement archiving locally, a program that archived by acting as a mail gateway would be the ideal since we'll be supporting a wide-range of mail servers."
i remember seeing somethign like this once (Score:2)
the mail was POP'd in, and it went straight into an SQL database somewhere; it was actually very fast and very reliable.
Perhaps you could do something similar. SQL is pretty ubiquitous.
Re:i remember seeing somethign like this once (Score:2)
Totally. It would be so simple to do. Just have a script on your mailserver to link a script to write the email (breaking down into the various fields) and attachment to a relational database. Super easy. Use a robust database, prioritise write speed, would probably have to be pretty massive size wise as you'd be writing every single email for 7 years (I think that's what the SEC requires, but don't cite me in court!). Infact, it coul
Re:i remember seeing somethign like this once (Score:2)
Re:i remember seeing somethign like this once (Score:3, Informative)
Exchange4Linux [exchange4linux.org] does exactly this. Works pretty well, we've got a shitload of email (videos too), 5000+ contacts and all manner of data sitting in a PostgreSQL database.
It's NICE being able to execute SQL queries on your aggreate communications data. Perfect example: Our Asterisk head-end system knows which of our customer service people is on pager duty with an SQL query which looks at their service calendar. :-)
Re:i remember seeing somethign like this once (Score:1)
Ah, it only knows if they should be on duty. You really need to fix a GPS scanner and electrode to their spine to ping them (and get unique response) to really know if they're on duty. And if that works, to send them a few dud messages from the customers they're there to support, e.g. "my scanner says it has 0xF4C83D, I'm running NetBSD-experimental.0.03.01.
Re:i remember seeing somethign like this once (Score:2)
geocities, archive.org and public key cryptography (Score:1)
Re:geocities, archive.org and public key cryptogra (Score:3, Funny)
IMAP (Score:4, Insightful)
If all of the emails are stored in an imap account then you could access this programatically using PHP's Imap functions. I do the same thing using a cron job to check an email account every 5 minutes on my site, if theres a new mail it looks to see if it has an image attachment and if it has automatically posts it online for me.
Information about PHP's Imap functions can be found at http://uk.php.net/imap [php.net].
I'm not entirely sure if this is the kind've thing you are looking for, but this is probably how I would deal with the problem.
Regards,
Grant
Re:IMAP (Score:3, Funny)
You missed the obvious (Score:2, Insightful)
They seems to have something of a specialty in archiving e-mail and search technology and usually have some kind of API.
IMAP as the API (Score:3, Interesting)
You could very easily implement this as a simple forwarding daemon, or as an plugin to your existing MTA, just store all mail going anywhere in a separate, append-only mailbox, then use IMAP to access it remotely.
IMAP is an industry-proven protocol, there exist many open-source implementations, and has been specifically developed for situations where the mail will remain on the server. It provides you with searching and tagging, plus you can organize the mail store as you see fit (f.e. each years mails in a separate folder, while still able to search all of them at once) (sort known spam in a separate folder while keeping it around). Granted, I'm not aware of any IMAP server that uses an SQL back-end, so this may become a bottleneck for you.
Re:IMAP as the API (Score:3, Informative)
Re:IMAP as the API (Score:1)
Re:IMAP as the API (Score:2)
Why SQL? (Score:2)
Re:Why SQL? (Score:2)
mbox and maildir have been around for a long time; they have been designed by experts to store mail which is a very different problem to storing hierarchical column data. You could make the argument that all email conversations are hierarchical, but...
Personally, I have every email for about the last ten years in maildirs; I know that so long as my data is kept safe and secure [thefilehighclub.com], I will be able to read the emails in ten years' time.
There was even an application featured in a Linux magazine a couple of
Re:Why SQL? (Score:2)
MySQL for example has full text search capability.
Re:IMAP as the API (Score:2)
HTH. HAND.
Cheers.
this sounds like it would work... (Score:1)
It sounds like you need all of your mail servers configured to dump incoming and outgoing messages to a database.
I don't do much mucking around with mail servers, so if they don't have any easy integration with databases, I'm sure you could have them log to a file, with a scheduled script that loads any logged messages into the databa
Java (Score:2, Interesting)
Perl, LWP (Score:2)
You're talking about SQL storage of messages (Score:3, Informative)
is one starting point, but there are a few others.
You're basically replacing
Re:You're talking about SQL storage of messages (Score:1)
Assentor (Score:2, Informative)
Re:You already have the solution in-house. (Score:2)
1 - Exchange 2003 has a DB interface
2 - There are 3rd party products that integrate with Exchange and provide everything you need for compliance in a nice pretty interface.
SEC will not allow exactly what you want (Score:2, Informative)
Disclaimer: I work for a company that makes SOX compliance appliances.
The SEC requires you to keep all email in house. As far as we can tell that means your storage must be in house, not at a service provider.
We don't provide such an interface. In our products. We want as few possibilities for bugs where you can delete/alter email as we can. By sticking to our interface we have a better chance of keeping you from doing something illegal (which could reflect on us). However we do provide a web inte
Re:SEC will not allow exactly what you want (Score:2)
Object DataBase (Score:2)
Will retain records pursuant to a number of different gov requirements for reporting.
Use Kasten Chase for encrypting, if needed (we have an object shim -- and this is a plug). That will give you your data security.
Maybe other solutions... but that's the one I am familiar with.
Ratboy.
Re:Object DataBase (Score:1)
MessageRite (Score:2)
www.messagerite.com
YMMV, but... (Score:2)
Paul B.
Java or PHP + SQL Database (Score:2)
[1] Sourceforge is owned by the same folks that own slashdot.
[2] I'm not affiliated with either, except as a user.
you mean... (Score:1)
I built one. (Score:2)
For what it'
I doubt you'll find what you're looking for (Score:2)
If you'll need to search them, forget a database and use a Lucene [apache.org] index. You could also store all the text verbatim in lucene and forget the database.
A quick search on Google turns up... (Score:1)
Looks like that does exactly what you're looking for.
Do some research ... (Score:4, Informative)
You'll still need to provide security as to who can view messages. Search for legal purposed. You have document rentention scheduled you'll need to adhere to. You'll potentially have a freakin' huge volume of data to look it.
I'm seeing a lot of references to PHP and Java classes -- something as important as SEC regulations for e-mail archiving shouldn't just be thrown together willy-nilly. Failure to get it right could cause *huge* legal problems downstream.
Mail archiving for SEC/SOX is an utterly non-trivial undertaking.
Cheers
If not now, soon. (Score:2)
Google (Score:2)
They have a indexing appliance as well as a Google API? That way your company can also keep all its indexed email in its own data center.
http://www.google.com/enterprise/gsa/features.htm
Microsoft Access reads from Microsoft Outlook (Score:2)
If your employer doesn't use Microsoft Exchange servers, this advice won't really apply to you.
If you're a moderator and y
Courier (Score:3, Informative)
I Almost Hate To Ask This (Score:2)
Re:I Almost Hate To Ask This (Score:1)
"Programmatically accessible" means accessible by a computer program.
I.e., software that not only hs a user interface (GUI or CLI) but has a function library, or TCP protocol suite, or web services, or RPC, or REST, or some other way to access the data and functionality of the software, from other software.
Re:I Almost Hate To Ask This (Score:2)
TFS Gateway (Score:2)