Open Source Filtering?

Open Source Filtering? 13

Posted by Cliff on Sunday February 18, 2001 @09:17AM from the better-open-than-closed dept.

David Guichard asks: "Maybe I've just missed it, but has there been any talk or action on an open source Internet filter? I'm thinking of something that would allow libraries and schools to comply with the law, but would not hide the list of forbidden sites and would allow complete local control, and certainly would not track user browsing. I realize a lot of people wouldn't want to get anywhere near this on principle, but it seems like a winner to me. For example, would junkbuster satisfy the law already? What is missing that the law requires?" If you have to have some form of filtering in place, better an open solution than a closed one.

Open Source Filtering?

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 13 Comments Log In/Create an Account

Comments Filter:

Cliff, you've done it again. (Score:2)

by Anonymous Coward writes:

Open-Source filtering in action.

Just post it to "Ask Slashdot", where no one will ever see it.

Bravo.
How about doing it right then?? (Score:2)

by whoop ( 194 ) writes:

Heh, this comes up quite a bit here. I put up my thoughts nearly a year ago. Since then other things have occupied my time, but it's a method I think could work if there was enough community support (not just the Slashdot audience).

Look here [slashdot.org], and do a text search for "How about doing it right then??"

Doing it wouldn't be that difficult. The Squid proxy has a good number of filters for banner ads and such. You would just need to swap out the list of banner ads with your filtered sites.
Re:Two reasons why not (Score:2)

by Ian Bicking ( 980 ) writes:

First, the whole point of censorware is that you can't get around it.

Not entirely. Part of the point is to keep people from accidentally going somewhere they didn't want to. There are a considerable number of pornographic sites which portray themselves as something else (through spam, misspellings, etc).

Apart from the philosophical absurdity, censorware can never work on an open-source operating system without stringent physical controls as well.

Of course it can -- it's terribly easy, really. All you have to do is put the censorware on the firewall. Sure, the firewall has to be physically secured, but that was already true.

If you want to make up a blacklist of sites which don't want to be blacklisted, you have a fight on your hands.

There is potentially someplace inbetween -- making a blacklist of sites that people didn't want to go to, but found themselves at anyway. When I was supervising a very small lab with junior high kids, every so often someone would end up at a porn site or something otherwise inappropriate. The kids didn't want to go there -- at least not in a public lab. If it was easy to block that site at that time, the sites that were actually problematic would get added.
Every obscure pornographic site doesn't have to be included -- just the ones that matter. I think we could take a lot of the winds out of the sails of pro-censorship people if just the fraudulent sites could be blocked. Sure, if someone wanted to see porn they still could, because a large portion of the porn wouldn't be blocked. They get what they asked for, and I don't think that's a big deal.

Some human being has to categorize them, or you'll be no more accurate than the existing closed-source blacklists

This is a significant problem. Right now there are fundamentalist who deface library books because they are opposed to the books. It would be easy to abuse a volunteer-based system similarly.
Part of what would help, I think, is the openness of the system. It should be just as easy to submit a complaint about a blocked website as it is to block a website, and it should be easy to figure out where your website stands without participating in any blocking yourself. If this was combined with a reputation system for the sugggesting moderators, then the worse abusers could be isolated.
Two reasons why not (Score:3)

by jamiemccarthy ( 4847 ) writes: on Sunday February 18, 2001 @11:01AM (#422410) Homepage Journal
There are two important considerations here.
First, the whole point of censorware is that you can't get around it. If you have a choice of whether to run it or not, it might be searching, filtering, categorizing, whatever, but it's not censorware.
The idea of an "open" solution which is forced upon people is a little silly. Apart from the philosophical absurdity, censorware can never work on an open-source operating system without stringent physical controls as well.
(Recall the first rule of security: anyone who has physical access to your machine has the potential to compromise it. This may be as simple as booting from floppy!)
Second, making up a blacklist of porn sites is trivial if you just want to list the ones who want to be listed. Use RSACi. It's already built into your browser. Almost all porn sites rate with RSACi, and they want to be blacklisted, because it helps immunize them from prosecution for providing porn to kids (or at least that's the perception).
If you want to make up a blacklist of sites which don't want to be blacklisted, you have a fight on your hands. It's a phenomenal amount of work to scan the web. Consider the massive server farms and pipes of unholy size that Google or Alta Vista have to use to spider the web. Who's going to volunteer to set up a similar installation to spider porn sites?
If you think you're just going to provide a way for volunteers to send in "hey, I found another porn site" URLs, don't be silly. Most of those submissions are going to be RASCi-rated; almost all the rest will be overlap. The web is huge. Porn is about 1% of it. One percent of huge is still huge.
And then, the big question: who's going to make decisions about these allegedly porn (but not self-rated) sites? Some human being has to categorize them, or you'll be no more accurate than the existing closed-source blacklists (which is to say, laughably inaccurate).
That takes time, and with millions of new or changed pages on the web every hour, do the math and figure out how much time you can expect to get out of your volunteers. How many dollars of free labor does this hypothetical project depend on? Do porn-hating geeks really hate porn that much, that they'll sit in front of a monitor all day for free and surf porn sites?
Short version: if it were easy to do, someone already would have done it. In fact there already exist several places that keep an "open" list of porn sites which can be dropped into any Squid proxy. Most of them are years old and will never be maintained again:
- squidblock.tgz [squid-cache.org] from July 1999
- sxcontrol [onda.com.br], last change February 2000
- INfilter [snerpa.is], last revised March 2000
- Linux Center's squidblock.tgz [hklc.com]
  Click the "Latest" link, which is there "just to show that someone is using it!" Note that the "latest" additions to the blacklist include such obscure sites as playboy.com, and such recent new sites as dailydirt.com (domain registered on Jan 12, 1998).
Jamie McCarthy
filtering domains & mail (Score:1)

by johnjones ( 14274 ) writes:

how can you do it ?

you cant you have to rely on the consumer to make the choice

if you dont you are restricting free speach

simply you have to make the choice on what to block OR trust someone to make the choice for you

now personaly I trust no one to do that for me I would rather see the odd silly thing than have a whole bunch of dictionarys censored because they contain banned words

funny if you think of it like that (-;

regards

john jones
Linux Filtering (Score:1)

by SEWilco ( 27983 ) writes:

Actually, there are a number of HTML proxy filters. Start with the list in Freshmeat Old Appindex :: Daemons :: Proxy [freshmeat.net]. In addition to the content filters which are already there, notice that the banner filter technology can also be used to filter other things.
Linux firewalling also allows user-level filters -- packets can be directed to programs for filtering.
a short answer (Score:2)

by scotpurl ( 28825 ) writes:

If someone wrote the grammar engine well, then it'd be able to realize this. I don't think porn sites could get away with using 100% images, since I thought they all solicited credit card information at one point or another. I guess you could use a PNG file for every letter of the alphabet.

Still, a good grammar engine should be able to figure out that there's either no text (red alert), or that there's text but formatted strangely with a lot of color codes. Like if someone used something like the Pixel Transformer from http://www.545studios.com.

The important vague promise of a grammar engine is that you can also use it as a content filter. Filter out your annoying co-worker. Filter out advertising. Filter out anything that you don't like. Sure, it will turn several of us into digital hermits, but I'll take that risk.
grammar-based filtering, not keyword (Score:3)

by scotpurl ( 28825 ) writes: on Sunday February 18, 2001 @06:33PM (#422414)

The solution to the entire problem is not, NOT, keyword filtering. It's grammar-based filtering.

What the @#!! is grammar-based filtering?

It's where the parsing engine has enough intelligence to figure out what's going on. What the subtleties are. What the nuances are. If there are any double-entendres or hidden meanings.

Then, and only then, can you use the computer to make value-based decisions using fuzzy rules about whether or not the content should be seen. And once that happens, I'll gladly use filtering. Why? Because I'll be able to filter out advertisements at a minimum. :-)

Re:grammar-based filtering, not keyword (Score:1)

by bryhemm ( 103485 ) writes:

There is one problem with grammer based filtering of porn sites. PORN SITES ARE BASED ON IMAGES! And lately people are replacing text with pictures of text, so what would be the point of grammer based filtering?
Re:Two reasons why not (Score:1)

by danpbrowning ( 149453 ) writes:

I agree with Ian. Open source filtering at the firewall would be good start. And so what if it only blocks *some* or *half* (by % activity) websites, that's better than none in my opinion. It's too bad that no one is really working hard on this.
Try SquidGuard (Score:1)

by mikehoskins ( 177074 ) writes:

http://www.squidguard.org/
While it's a work in progress, it works with Squid.
Nicolas Petreley recommends it.
Yes. Do it right. (Score:1)

by SagSaw ( 219314 ) writes:

But I don't think you will get much supprot here, as many /.'ers have an absolute no filtering is good filtering view. In some ways, I agree. I think that government has absolutly no business mandating filtering, as it amounts to censorship. However, I have no problem if individual or private institutions (i.e. families, businesses, etc.) choose to impliment filtering. Additionally, I have no problem with schools implimenting filtering, as the purpose of internet in schools is purely educational, and any other use (porn or otherwise) costs the school money and resources.

The argument against filtering in schools and businesses is not so much 'I should be able to look at porn at work or in my high school computer lab' but 'The current filtering technology blocks usefull, informational and educational sites while not blocking much of the material it was intended to block'. The solution to the latter argument is a filter that works. (As hypothetical as it may be).

Making a filter that works is not a trivial task. There are many compainies out there that have spent lost of time, money, and resources making filters that don't work right. Good luck.
Similar technologies available (Score:2)

by JediTrainer ( 314273 ) writes:

Why not just pick up on where MAPS [vix.com] and ORBS [orbs.org] left off. They give a pretty good (arguably, I know) service in marking open mail relays and email addresses used by spammers.

Why not use similar technologies for web sites? Just maintain a list of IPs, domains and specific URLs which should be filtered? What SHOULD happen, though, is some sort of categorization and rating system. In other words, under category "sex" you might have a rating of "1" for partially nude/suggestive pictures and "10" for explicit stuff. The service would have to provide guidelines as to how to rate the URLs.

Taking this example further, one would implement a Slashdot-like moderating system to give URLs "negative karma", where the administrators of the networks using the filtering system have the opportunity to place their votes on which stuff they want hidden most.

On the user's end, the network admins could have the ability to screen based on category and rating (like, filter category Sex with negative karma above 4), and the ability to override the rating of a particular site if they feel that it was marked unfairly (or get user complaints about a bad filter).

This system will obviously be very dependent on good guidelines and good participation on the part of the network admins. Obviously a free system wouldn't be able to afford to have full-time staff finding stuff to filter, but the good part about this is the list would be dynamic. Perhaps the database could be automagically downloaded weekly from a central repository in a cron job somewhere, giving the network the latest and greatest of the filters. Again, the overrides the admin put in place at the user's end would take effect, so any updates to the overridden site's rating will be ignored.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Open Source Filtering? 13

Open Source Filtering? More Login

Open Source Filtering?

Cliff, you've done it again. (Score:2)

How about doing it right then?? (Score:2)

Re:Two reasons why not (Score:2)

Two reasons why not (Score:3)

filtering domains & mail (Score:1)

Linux Filtering (Score:1)

a short answer (Score:2)

grammar-based filtering, not keyword (Score:3)

Re:grammar-based filtering, not keyword (Score:1)

Re:Two reasons why not (Score:1)

Try SquidGuard (Score:1)

Yes. Do it right. (Score:1)

Similar technologies available (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot