Ask Slashdot: Speeding Up Personal Anti-Spam Filters? 190
New submitter hmilz writes "I've been using procmail for years to filter my incoming mail, and over time a long list of spam patterns was created. The good thing about the patterns is, there are practically no false positives, and practically no false negatives, i.e. I see each new spam exactly once, and lose no legit mail. This works by using an external spam-patterns file, containing one pattern per line, and running an 'egrep -F' against it. As simple as this is, with a long pattern list this becomes rather slow and CPU consuming. An average mail currently needs about 15 seconds to be grepped. In other words, this has become quite clumsy over time, and I would like to replace it by a more (CPU, hence energy) efficient method. I was thinking about a small indexed database or something. What would you recommend and use if you were me? Is sqlite something to look at?"
spamassassin (Score:5, Insightful)
Database? (Score:3, Insightful)
Re:Or... (Score:4, Insightful)
Which pretty much defeats the whole point of hosting your own email...
Problem spotted. (Score:5, Insightful)
The problem is that you're using egrep in the first place. Here's the thing -- the overwhelming majority of your cycles are getting sucked loading, initializing, executing, then unloading, that thread. It's not that using regular expressions is processor-intensive... it's that repeatedly launching the same executable is.
Use something that can load once, read in the patterns, check all the e-mails that are queued, sort them, then exit. Your execution time will go from 15 seconds to 150 milliseconds.
Re:spamassassin (Score:5, Insightful)
Maybe the software is pretty much finished? In that case there's not much more to do - no new features to add, and sooner or later you'll run out of bugs to fix.