Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Software Spam Databases IT

Ask Slashdot: Speeding Up Personal Anti-Spam Filters? 190

New submitter hmilz writes "I've been using procmail for years to filter my incoming mail, and over time a long list of spam patterns was created. The good thing about the patterns is, there are practically no false positives, and practically no false negatives, i.e. I see each new spam exactly once, and lose no legit mail. This works by using an external spam-patterns file, containing one pattern per line, and running an 'egrep -F' against it. As simple as this is, with a long pattern list this becomes rather slow and CPU consuming. An average mail currently needs about 15 seconds to be grepped. In other words, this has become quite clumsy over time, and I would like to replace it by a more (CPU, hence energy) efficient method. I was thinking about a small indexed database or something. What would you recommend and use if you were me? Is sqlite something to look at?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Speeding Up Personal Anti-Spam Filters?

Comments Filter:
  • spamassassin (Score:5, Insightful)

    by mdaitc ( 619734 ) on Friday August 30, 2013 @08:56PM (#44721401)
    have you tried spamassassin?
  • Database? (Score:3, Insightful)

    by K. S. Kyosuke ( 729550 ) on Friday August 30, 2013 @09:01PM (#44721421)
    What would the database achieve? I'm not sure what is the exact nature of the patterns (an example would really help here), but perhaps writing a compiler from the patterns into some decision procedure in something reasonably efficient yet featuring quick start, such as SBCL or Gambit, could help.
  • Re:Or... (Score:4, Insightful)

    by asmkm22 ( 1902712 ) on Friday August 30, 2013 @09:33PM (#44721559)

    Which pretty much defeats the whole point of hosting your own email...

  • Problem spotted. (Score:5, Insightful)

    by girlintraining ( 1395911 ) on Friday August 30, 2013 @10:15PM (#44721735)

    The problem is that you're using egrep in the first place. Here's the thing -- the overwhelming majority of your cycles are getting sucked loading, initializing, executing, then unloading, that thread. It's not that using regular expressions is processor-intensive... it's that repeatedly launching the same executable is.

    Use something that can load once, read in the patterns, check all the e-mails that are queued, sort them, then exit. Your execution time will go from 15 seconds to 150 milliseconds.

  • Re:spamassassin (Score:5, Insightful)

    by wvmarle ( 1070040 ) on Friday August 30, 2013 @10:39PM (#44721829)

    Maybe the software is pretty much finished? In that case there's not much more to do - no new features to add, and sooner or later you'll run out of bugs to fix.

Two can Live as Cheaply as One for Half as Long. -- Howard Kandel

Working...