Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet IT

Smart Spam Filtering For Forums and Blogs? 183

phorm writes "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards — and especially with most forms of captcha being cracked fairly often these days — it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times."
This discussion has been archived. No new comments can be posted.

Smart Spam Filtering For Forums and Blogs?

Comments Filter:
  • Akismet (Score:5, Informative)

    by seifried ( 12921 ) on Sunday December 28, 2008 @05:49PM (#26252239) Homepage
    Akismet
  • I always thought (Score:2, Informative)

    by davebarnes ( 158106 ) on Sunday December 28, 2008 @05:52PM (#26252263)

    Re-Captcha was fairly effective and easy to install and useful.

  • Second that! (Score:5, Informative)

    by _merlin ( 160982 ) on Sunday December 28, 2008 @06:03PM (#26252349) Homepage Journal

    Akismet [akismet.com] is the best thing for blog spam prevention ever. I can't believe you've never stumbled across it before. It uses statistical analysis to identify spam, and the more people use it, the better it gets. If everyone used it, the blog spammers would just disappear because their attacks would be completely ineffective.

  • Re:Second that! (Score:5, Informative)

    by seifried ( 12921 ) on Sunday December 28, 2008 @06:09PM (#26252417) Homepage
    Add to which it has an API/etc. It really is what you should be using.
  • YAWASP for wordpress (Score:3, Informative)

    by zimtmaxl ( 667919 ) on Sunday December 28, 2008 @06:22PM (#26252509) Homepage
    There is a well working semi-dynamic plugin for wordpress. It has served me well. It is called YAWASP and you can find it here: http://wordpress.org/extend/plugins/yawasp/ [wordpress.org]. The author also describes the common problems & shortfalls with traditional captcha-like methods.
  • "I am a robot" field (Score:5, Informative)

    by casualsax3 ( 875131 ) on Sunday December 28, 2008 @06:23PM (#26252511)
    The ZSNES boards employ a neat trick: http://board.zsnes.com/phpBB2/profile.php?mode=register&agreed=true [zsnes.com]

    It's got a field that says "I am a robot" checked off by default. A human should obviously see that and uncheck it. Those registrations that come in with it checked are blackholed. It's definitely cut down on the SPAM accounts since they enabled it.

  • Re:D.I.Y. (Score:4, Informative)

    by Korin43 ( 881732 ) on Sunday December 28, 2008 @06:27PM (#26252533) Homepage
    Yes. The point of FOSS is that one person can do it and no one else needs to do it again unless they want to make it better. This guy is looking for a solution, and the solution already exists. He would be wasting his time if he did it himself.
  • Hidden Input Box (Score:5, Informative)

    by waldoj ( 8229 ) <<waldo> <at> <jaquith.org>> on Sunday December 28, 2008 @06:31PM (#26252557) Homepage Journal

    Third, do they fill out a hidden inputbox? This is sort of the reverse captcha.

    This is really a very good test. As others have mentioned in this thread, it's the sort of thing that spammers will circumvent if it becomes widespread, but for now it's great.

    There's something else I've found to be really quite effective: deliberately misnaming my form fields. For instance, give the input field that's labelled "First Name" an input name of "phone number." Humans don't use input names to determine what text to enter, but spambots do. Then check that inputâ"if the first name field contains a phone number, you know you've got yourself spammer.

    I've used solely the combination of these two things to run one of my websites for two years now, and I get a vanishingly small amount of spam.

  • Message board spam. (Score:5, Informative)

    by JWSmythe ( 446288 ) * <jwsmytheNO@SPAMjwsmythe.com> on Sunday December 28, 2008 @06:32PM (#26252573) Homepage Journal

        I had a similar problem in the comments area of my site. It was all fun and games, until one day I checked, and there were something like 1000 spams for every real message.

        I wrote my own system to deal with it. It's not very hard, assuming you know how your site works (of course you do, right?)

        I ended up making two blacklists. One was for words and phrases. The spammers tend to post (and repost, and repost) the same crap. My blacklist rules had some simple regular expressions that I could run queries with. Like, "%http://%spamsite%" and "%v%gra%". You get the idea. The second list was IP's that were known spammers.

        At the time, I allowed both anonymous comments, and comments from logged in users. I eventually did away with the anonymous comments, as they were a headache. This was the best cure.

        So, when my script ran (once a minute), if it matched a message, it would delete the message, and append the IP to the IP blacklist. If it was posted by a user account, the user account got suspended, so they could no longer log in, nor post.

        After it's detection and cleanup run, it then ran back over the IP list, and pruned out every post by that IP. Sometimes they'll do practice runs saying silly things like "nice site". I thought they were real user complements at first, until I saw the same posting verbatim coming from the same IP to multiple news stories, and then that IP would start spamming later.

        Some people will argue that the IP cleanup run was not nice, polite, or even fair. People use proxies. Sure, they do. We got a lot of abuse from anonymous proxies, and no real messages from them. The spammers didn't seem to like to use AOL.

        When I implemented this, I posted a very brief description of what I was starting ("We're starting advanced anti-spam protection"), with an apology for real messages that were deleted. I never received one complaint about real comments disappearing.

        How brutally you do it is really up to you. I built my method by manually doing it for a while, and then letting the script do it on it's own. Occasionally, I would have to go in and add new words and/or site names to the words blacklist.

        I noticed the spammers hit more common software more often. It's worth it for them to make automated systems to abuse a piece of software that's deployed on tens of thousands of sites. When I rewrote my site from scratch, then abuses dropped down to 0 for a long time. Now, they manually submit "news" items which are just ads for their own sites. It appears to be manual, and since we won't run them as news stories (our editorial staff decides what does or doesn't show up as news, and if it needs to be edited first), they give up pretty quickly.

  • Re:Akismet (Score:1, Informative)

    by Anonymous Coward on Sunday December 28, 2008 @06:49PM (#26252683)

    Hit Freshmeat for "bayesian" and PHP; that's the statistical method for calculating the probability that a given post is similar to a body of spam examples.

    There's a couple of PHP-based ones there, both open source.

  • Re:Second that! (Score:5, Informative)

    by _merlin ( 160982 ) on Sunday December 28, 2008 @06:55PM (#26252727) Homepage Journal

    I've used it for a few years now. In that time, it has caught tens of thousands of spam comments. It has missed about ten spam comments (i.e. allowed them through). It has misidentified two legitimate comments as spam. Yes, I realise I'm keeping an eye on it, and someone who doesn't may not notice that it's causing problems for them. But the stats are pretty good in my case. I'm aware of the allegations of corruption and using it to gag people, but that hasn't affected me yet.

  • by lalena ( 1221394 ) on Sunday December 28, 2008 @07:05PM (#26252809) Homepage
    As a follow up to myself, I didn't come up with these ideas on my own. I read them on Slashdot a couple of years ago.
  • by Hojima ( 1228978 ) on Sunday December 28, 2008 @07:17PM (#26252891)

    read my sig

  • Re:Akismet (Score:1, Informative)

    by Anonymous Coward on Sunday December 28, 2008 @08:23PM (#26253373)

    according to the wiki article http://en.wikipedia.org/wiki/Akismet [wikipedia.org] if you say something bad about Matt Mullenweg you will be blacklisted.

    I fear being blacklisted hence the anon post :>

  • by ceejayoz ( 567949 ) <cj@ceejayoz.com> on Sunday December 28, 2008 @08:43PM (#26253465) Homepage Journal

    I seem to get Mollum captchas on every site that uses it. My IP, user agent, etc. are almost completely static. My comments are grammatically correct, never spammy, etc.

    If their system hasn't identified me as safe by now, there's something wrong.

    In contrast, to my knowledge Akismet has never flagged me. My comments go straight up on blogs using it. On my personal site, I've had maybe 10 false positives out of several thousand caught.

    Mollom, IMO, has a long way to go.

  • by KermodeBear ( 738243 ) on Sunday December 28, 2008 @11:12PM (#26254327) Homepage

    I have a very simple, small site that I run that allows small comments. It was fine until the spam bots found it. Anyways, I just added a simple question about the background color of the site, which must be correct in order for the comment to be posted. I haven't had a single issue since (except for the occasional troll, but what can you do about that).

    The nice thing about something like this, a handmade thing, is that the spammers won't bother 'breaking' it. As the parent mentions, the spammers are attacking the common solutions - so a little home grown bit will work wonders.

  • Re:Second that! (Score:5, Informative)

    by sfbanutt ( 116292 ) on Sunday December 28, 2008 @11:49PM (#26254495) Homepage

    I just noticed a handy Akismet stats link in the latest version. I've been running Akismet since October 2006, in that time there have been 26,575 comments on my blog, of which 26,302 were spam(!). It missed 25 spam comments that had to be manually moderated and passed 273 legit comments. There have been no false positives. Personally, I think that's a pretty darn good record.

  • by zimtmaxl ( 667919 ) on Monday December 29, 2008 @05:05AM (#26255917) Homepage
    I forgot to mention these 2 plugins:
    SABRE: against spam registrations on your blog ( http://wordpress.org/extend/plugins/sabre [wordpress.org])
    and
    Simple Trackback Validation: a trackback validation tool for wordpress ( http://wordpress.org/extend/plugins/simple-trackback-validation/ [wordpress.org] ).
  • Re:gmail (Score:4, Informative)

    by shutdown -p now ( 807394 ) on Monday December 29, 2008 @09:58AM (#26257285) Journal

    You'd be surprised. It is trivial for spammers to get a gmail account.

    It's no less trivial than getting a Hotmail account, a Yahoo! account, or any of the many thousands of free webmail providers out there.

    Even so, I suspect that the majority of casual Internet users today actually have that sort of email account, based on personal experience. If you start blocking them, you're blocking most legit users, too. Unless it's a technical forum - and even in this case it's silly to block GMail, as many techies use that.

    Anyone who genuinely wants to contribute to a forum will have another email address

    Why? I for one don't have one - I use my GMail one everywhere - and I contribute to a lot of forums.

    if not they will be able to explicitly email

    Translate, please. Explicitly email what where, and how is that going to help?

All the simple programs have been written.

Working...