Who Isn't Paying Attention to ROBOTS.TXT?

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Who Isn't Paying Attention to ROBOTS.TXT? 85

Posted by Cliff on Thursday June 09, 2005 @06:10PM from the bad-spider-no-donut dept.

Kickstart asks: "After wading through the Apache logs, after being hit hard for three hours by a very unfriendly spider, I see that there appear to be real, legitimate, search engines that do not follow robots.txt rules. Looking around, I see that some specialized search engines make no mention of their policy on this or say what servers their spiders come from. Does anyone have information on who follow this standard and who doesn't?"

This discussion has been archived. No new comments can be posted.

Who Isn't Paying Attention to ROBOTS.TXT?

Load All Comments

Search 85 Comments Log In/Create an Account

Comments Filter:

zerg (Score:4, Interesting)

by Lord Omlette ( 124579 ) writes: on Thursday June 09, 2005 @06:12PM (#12774468) Homepage

The next question should be, "How do we make them regret their non-compliance?"

Share
twitter facebook
- Re:zerg (Score:4, Informative)
  
  by Intron ( 870560 ) writes: on Thursday June 09, 2005 @06:16PM (#12774505)
  
  zerg? [webpoison.org]
  
  Parent Share
  twitter facebook
  - Re:zerg (Score:4, Informative)
    
    by BrynM ( 217883 ) * writes: on Thursday June 09, 2005 @08:44PM (#12775808) Homepage Journal
    
    From the WebPoison site:
    
    "WebPoison.org is an open source project... (at the bottom of the page) *Technically speaking, webpoison.org is not "open source" because the source code may never be made public- doing so would undermine the project's central goal.
    
    Sorry, but it rubs me wrong when a project claims to be OSS on the first line of their about page only to tell me they lied in the fine print at the bottom. They may be doing a good thing, but they should be blunt and honest about it.
    
    Parent Share
    twitter facebook
    - Got Zerg Source? (Score:3, Informative)
      
      by Kalak ( 260968 ) writes:
      
      WPoison [monkeys.com] is a Perl script, as source (naturally).
      
      WPoison is actually better from a technical standpoint, as it's a random page each time, not just a block of pages you download.
      - Re:Got Zerg Source? (Score:2, Interesting)
        
        by paulatz ( 744216 ) writes:
        
        It is better from a tchnical standpoint, but it could be worse from a practical one. Expecially if WPoison generated pages can be automatically detected.
        
        Re:Got Zerg Source? (Score:3, Interesting)
        
        by Kalak ( 260968 ) writes:
        
        I hadn't considerd that until this morning, but you can add to the source to do things like randomize meta tags, include text from other pages at random, etc. to make it less likely to detect a pattern.
        
        If you're *really* serious about non-detection, then you should vary the amount of poison in the pages, so that some will be merely annoying or almost innocent, with links that are completely lethal.
        
        If I was a perl hacker (instead of merely playing a sysadmin at work), I'd write this idea out, so if anyone
        
        Re:Got Zerg Source? (Score:2)
        
        by Intron ( 870560 ) writes:
        
        The Wpoison copyright [monkeys.com] requires you to put their logo on your website, which would be kind of a tipoff, right there. If I wrote a spider that did look at robots.txt I might not crawl a site with that logo. Some people just don't like spiders.
        
        Re:Got Zerg Source? (Score:1)
        
        by Kalak ( 260968 ) writes:
        
        Well, I guess I need to remove wpoison, since mine is a kid-friendly site and is definitely not a place where a logo like that is appropriate.
- Re:zerg (Score:5, Interesting)
  
  by Eric Giguere ( 42863 ) writes: on Thursday June 09, 2005 @06:21PM (#12774558) Homepage Journal
  
  Start returning 500 errors... Or 302s that redirect them back to themselves...
  Eric
  PS: Is there some kind of bot storm going on, I'm getting all kinds of weird accesses to my site today, they're all fetching just the home page and leaving, and the referrer tag is null for everyone... They may be committing click fraud through my site, which makes me mad...
  
  Parent Share
  twitter facebook
  - Re:zerg (Score:2)
    
    by Kelson ( 129150 ) * writes:
    
    If your site uses PHP, you may be able to adapt Bad Behavior [ioerror.us]. The script was originally developed for WordPress and has already been ported to MediaWiki and Geeklog. It identifies known "bad" robots and robots that imitate real browsers based on the HTTP headers, then sends an access denied response.
    - Re:zerg (Score:2)
      
      by Eric Giguere ( 42863 ) writes:
      
      Actually, I was more wondering if there was some zombie going around right now. Nobody's really accessing my blog, just the home page. Also, I see a lot of accesses from the CoDeeN project, so I wonder what's up....
  - Re:zerg (Score:3, Informative)
    
    by Avian visitor ( 257765 ) writes:
    
    PS: Is there some kind of bot storm going on, I'm getting all kinds of weird accesses to my site today, they're all fetching just the home page and leaving, and the referrer tag is null for everyone... They may be committing click fraud through my site, which makes me mad...
    
    See this discussion on SecurityFocus
    
    http://www.securityfocus.com/archive/75/401729/30 / 0/threaded [securityfocus.com]
- Making them Pay (Score:4, Interesting)
  
  by Kelson ( 129150 ) * writes: on Thursday June 09, 2005 @06:30PM (#12774631) Homepage Journal
  
  How about Stopping Spambots [neilgunton.com]?
  
  Parent Share
  twitter facebook
- Re:zerg (Score:3, Informative)
  
  by dasunt ( 249686 ) writes:
  
  The next question should be, "How do we make them regret their non-compliance?"
  
  robots.txt:
  User-agent: *
  Disallow: /the-site-that-never-ends/
  
  Its trivial to write a script that will link back to itself to make millions of bogus pages. If you include address rewriting, it won't even appear to be a script.
  The only downside is that while you are wasting their CPU and bandwidth, you are also wasting your own resources. If your CPU is mostly idle, then its mostly a waste of bandwidth.
- Re:zerg (Score:2)
  
  by crazyphilman ( 609923 ) writes:
  
  Here's the programmer solution:
  
  1. Study your firewall logs, and try to determine some baseline criteria that identifies a spider. While you're at it, note the domains the spiders are coming from.
  
  2. Create a small perl script that, when fed the IP address of a questionable domain, automatically does an add (creating a new "DROP" rule for that domain) and tacks the command onto your existing firewall script. This is your manual tool. Of course, you should debug it using your spider list from 1, above.
  
  3. On
- Re:zerg (Score:2, Interesting)
  
  by Nagus ( 146351 ) writes:
  
  The next question should be, "How do we make them regret their non-compliance?"
  
  Tarpit [wikipedia.org] them! Bonus points if you feed them bogus data at the same time.
  
  Tarpitting unwelcome spiders not only limits the damage (in terms of bandwidth) they can do to you, but also the damage they can do to everyone else.
  
  Software for this is available, for example Peachpit [devin.com].
- Oh, they'll regret it if you try this. (Score:2)
  
  by jd ( 1658 ) writes:
  
  Use Apache's SSI to detect browser type. If it is a known bot type, have Apache return the results of a PHP script that creates a valid header, then pipes /dev/urandom through uuencode (trimming off anything that makes it clear that it's UUencoded), so that their database then has to process a bunch of garbage.
  Alternately, use it to your advantage. Have a page of text that is nothing other than porn-related words, and have Apache return that when the bot comes looking. You're guaranteed to get a lot more
Spammers are bad (of course) (Score:4, Insightful)

by grub ( 11606 ) writes: <slashdot@grub.net> on Thursday June 09, 2005 @06:14PM (#12774493) Homepage Journal

Does anyone have information on who follow this standard and who doesn't?
Most crawlers will obey. Spambot email harvesters will usually not. Generate a huge page of crap with loads of fake email addresses and put that in your robots.txt as uncrawlable and watch the spammers grab it.

Share
twitter facebook
- Re:Spammers are bad (of course) (Score:1)
  
  by khodsden ( 141859 ) writes:
  
  However, even the big ones don't always. Yahoo, for example, crawls my site despite a robots.txt that says
  
  User-agent * Disallow: /
  
  Emails to them, some which have included threatening legal action, have done little good.
  You'd think with all the sites clamouring to be in the search engine results, they'd honor requests to be out.
  - Re:Spammers are bad (of course) (Score:2)
    
    by JabberWokky ( 19442 ) writes:
    
    Is it Yahoo or somebody claiming to be Yahoo? I've seen useragents that claim to be from Big Companies that come from dinky IP addresses that don't seem to make sense.
    --
    Evan
  - Re:Spammers are bad (of course) (Score:2, Funny)
    
    by Dancing Primate ( 798703 ) writes:
    
    You mean
    
    User-agent: * Disallow: /
    
    yes?
    - Re:Spammers are bad (of course) (Score:3, Funny)
      
      by timothv ( 730957 ) writes:
      
      HAHAHA! Apparently not. See his own robots.txt [hodsden.org]
      - Re:Spammers are bad (of course) (Score:2)
        
        by timothv ( 730957 ) writes:
        
        Looks like he just fixed it.
      - Re:Spammers are bad (of course) (Score:2)
        
        by Thrakkerzog ( 7580 ) writes:
        
        damn, i missed it.
        
        anyone save the contents?
    - Re:Spammers are bad (of course) (Score:1)
      
      by GraemeDonaldson ( 826049 ) writes:
      
      Pwned! And by a dancing primate, no less. You have made my Friday. :-)
    - Re:Spammers are bad (of course) (Score:1)
      
      by khodsden ( 141859 ) writes:
      
      Whoo hoo! Slashdot as a help forum! Sweet!
  - Re:Spammers are bad (of course) (Score:2)
    
    by timmyf2371 ( 586051 ) writes:
    
    Emails to them, some which have included threatening legal action, have done little good.
    Which law makes it illegal for a search engine to ignore robots.txt files? And would you even have a valid claim against them due to non-compliance of a web standard?
- Re:ok, let's get this out of the way... (Score:1)
  
  by Exitar ( 809068 ) writes:
  
  All your robots.txt are belong to us.
Hey I've got an idea (Score:1)

by Anonymous Coward writes:

Why don't you play a different game. Rather than play "whine about unenforceable standards" why don't you play "Don't put stuff on the internet you don't want people to see".
Seriously. If you don't want it to get crawled, don't make it accessible by the outside. If you can't figure out how to do that, you get what you deserve.
- I've got a better idea (Score:3, Insightful)
  
  by Kelson ( 129150 ) * writes:
  
  RTFA and realize he's not talking about loss of "sensitive" data, but rather the DOS effect of extra traffic from rude robots.
- Re:Hey I've got an idea (Score:2, Insightful)
  
  by etymxris ( 121288 ) writes:
  
  There are good reasons for robots.txt. I use it keep crawlers from hitting "spam" forums on my website which is where all solicitations go. That way no google (or other search engine) rank is gained by spamming the site.
  
  I could just delete it all. But I'm trying to avoid deleting any posts.
- Re:Hey I've got an idea (Score:2)
  
  by Furry Ice ( 136126 ) writes:
  
  You don't want a spider crawling over a webapp and creating, deleting, updating data it knows nothing about. If there's a new wave of robots.txt ignoring spiders, there's going to be a lot of ugly side effects.
  - Re:Hey I've got an idea (Score:4, Insightful)
    
    by jbplou ( 732414 ) writes: on Thursday June 09, 2005 @08:11PM (#12775550)
    
    well you got a poor app if a spider can run right through it without authenicating and inserting/updating/deleting your data.
    
    Parent Share
    twitter facebook
  - Re:Hey I've got an idea (Score:1)
    
    by mabinogi ( 74033 ) writes:
    
    why the hell are you making GET requests modify data? That's what POST is for.
    
    GET should only do just that, and a user agent should be allowed to reload a page that is the result of a GET request without fear of side effects.
    
    You'll have trouble from more than bots if you've got an app written like that - you'll have users hitting the back and forward buttons on their browsers causing multiple entry.
    
    [OT]
    Slow Down Cowboy!
    
    Slashdot requires you to wait 2 minutes between each successful posting of a commen
- whitehouse.gov/robots.txt (Score:5, Interesting)
  
  by CommandoB ( 584587 ) writes: on Friday June 10, 2005 @03:26PM (#12782951) Journal
  
  The whitehouse seems to take a "pre-emptive" approach. Just in case they ever put stuff on the internet that they might someday not want you to see (or that they might not want archived by google), they seem to cover all the bases in their 92KB robots.txt file [whitehouse.gov].
  
  My personal favorites:
  Disallow: /911/iraq Disallow: /911/patriotism/iraq Disallow: /911/patriotism2/iraq Disallow: /911/sept112002/iraq [sic.]
  
  There's a theme here. Can you spot it? I'd like to think it's intentional, but at 2255 lines, it may just be that all permutations of Republican buzzwords have been covered.
  
  Parent Share
  twitter facebook
  - Re:whitehouse.gov/robots.txt (Score:1)
    
    by RealSurreal ( 620564 ) * writes:
    
    i think i'm gonna puke. mod parent up.
  - Re:whitehouse.gov/robots.txt (Score:2)
    
    by illuminatedwax ( 537131 ) writes:
    
    You missed the best one:
    Disallow: /wmd/text
Here is your problem: (Score:5, Funny)

by Neil Blender ( 555885 ) writes: <neilblender@gmail.com> on Thursday June 09, 2005 @06:45PM (#12774777)

All spiders are going to ignore your ROBOTS.TXT file. Instead, they look for a file called robots.txt.

Share
twitter facebook
- Re:Here is your problem: (Score:2)
  
  by dougmc ( 70836 ) writes:
  
  All spiders are going to ignore your ROBOTS.TXT file. Instead, they look for a file called robots.txt.
  
  I was going to say this (but in a better way, of course!) but suspected that somebody else might beat me to it, and indeed they did ...
  However, there is a bit more to it. If he has a web server on his Windows or Mac OSX box, the odds are that the filesystem in use is case insensitive, so either robots.txt or ROBOTS.TXT will work, because either would be served up by the web server when one reques
- Re:Here is your problem: (Score:2, Funny)
  
  by AndroidCat ( 229562 ) writes:
  
  What a lot of sites need is a slashdot.txt file.
  - Re:Here is your problem: (Score:2)
    
    by fredrikj ( 629833 ) * writes:
    
    How exactly is the distinction between Slashdotters and other web bots significant?
Ideal solution... (Score:2)

by Spoing ( 152917 ) writes:

Here's what would seem to work;

1. Create robots.txt, including references to the spam spider trap. Make sure that the legitimate references to normal pages are out numbered by a large margin.

2. When pages that could only be referenced in the spam spider trap are accessed, note the IP address.

3. Slowly respond or block connections from the originating IP address.

Bad guys are punished. Good guys are not. Low impact on system resources.

There's got to be a dozen filters out there that already do this.
- Re:Ideal solution... (Score:2)
  
  by TrebleJunkie ( 208060 ) writes:
  
  The problem with this approach is this: If the spider doesn't bother to even read the robots.txt file, nothing gets trapped.
  - Re:Ideal solution... (Score:2)
    
    by Spoing ( 152917 ) writes:
    
    The problem with this approach is this: If the spider doesn't bother to even read the robots.txt file, nothing gets trapped.
    No, that's the point. When the spider ignores robots.txt, they pick up the poisoned pages. Then, because they are doing something wrong, punish them.
    - Re:Ideal solution... (Score:2)
      
      by Issue9mm ( 97360 ) writes:
      
      I think that you're overlooking the point.
      
      If the poisoned pages are only findable from robots.txt, then if they ignore robots.txt, they won't be punished.
      
      If they're findable via links, or whatnot, then you're punishing more than the robots (that means your users). We typically frown on people (**AA) that do that sort of thing, yes?
      
      -9mm-
      - Re:Ideal solution... (Score:2)
        
        by Spoing ( 152917 ) writes:
        
        I think that you're overlooking the point.
        Not at all. The AC has it right;
        
        You make a robots.txt like with directories/pages they should not enter. You include links in pages that only a crawler would see normally. Only a crawler that ignored robots.txt or the exclusion in it would go those that trap pages and get banned.
        To make this very clear; the links on the legitimate pages are not normally visible or say things like "." or "," or "This is a trap for sp@mmer$"...whatever. Color the text white
        
        Re:Ideal solution... (Score:2)
        
        by X0563511 ( 793323 ) * writes:
        
        All that is fine and dandy untill you get a curious user, like me, who either sees the ghost link (move the mouse over and your cursor and status bar will reflect there being a link) or views the HTML for some reason and sees it. But then again, I can just pull out tor or something if I get banned.
        
        Re:Ideal solution... (Score:2)
        
        by Spoing ( 152917 ) writes:
        
        All that is fine and dandy untill you get a curious user, like me, who either sees the ghost link (move the mouse over and your cursor and status bar will reflect there being a link) or views the HTML for some reason and sees it. But then again, I can just pull out tor or something if I get banned.
        If someone is that inattentive, they get banned.
        
        If you want implementation details -- and it looks like you indeed do -- I'll be glad to provide them to you for a fee. Are you that curious, or can you figur
        
        Re:Ideal solution... (Score:3, Interesting)
        
        by stoborrobots ( 577882 ) writes:
        
        I always liked the way that arxiv.org dealt with this matter [arxiv.org]. It clearly says that it will initiate a seek and destroy against your site, if you visit a certain link.
        
        If you do go there, it initiates a countdown.... I've never stuck around long enough to see what happens when the countdown finishes... I like my internet connection just a little too much for that... :-)
        
        Re:Ideal solution... (Score:2)
        
        by ivan256 ( 17499 ) * writes:
        
        I've never stuck around long enough to see what happens when the countdown finishes...
        
        Judging by the fact that I can still post this after trying it, I'd say little or nothing.
        
        Re:Ideal solution... (Score:2)
        
        by Nate Eldredge ( 133418 ) writes:
        
        Yeah, it's a total hoax. Just there to scare the naive. Cute, but kind of d
        
        Re:Ideal solution... (Score:2)
        
        by hankwang ( 413283 ) * writes:
        
        I've never stuck around long enough to see what happens when the countdown finishes...
        Tried it, nothing happened, could still access the site. Since the page is from 1996 or so, it might be sending a ping of death to your IP. Back then you could crash a Windows computer by sending it a nonstandard ping.
Big name != "real" (Score:5, Informative)

by droleary ( 47999 ) writes: on Thursday June 09, 2005 @07:59PM (#12775440) Homepage

I see that there appear to be real, legitimate, search engines that do not follow robots.txt rules.

No, you rather see some well-known search engines that generate illegitimate traffic instead of behaving properly. I note a number of them in this highly-documented robots.txt [subsume.com] file. I'm personally most offended by idiots running this shit [grub.org], since there is no single IP block to blacklist.

Share
twitter facebook
- Re:Big name != "real" (Score:2)
  
  by alienw ( 585907 ) writes:
  
  Heh. I think that your attitude is part of the problem here. If you try to piss people off, they will try to piss you off too.
  - Re:Big name != "real" (Score:2)
    
    by droleary ( 47999 ) writes:
    
    Heh. I think that your attitude is part of the problem here. If you try to piss people off, they will try to piss you off too.
    
    I don't see your logic. By default, everyone gets to see the web site. In order to be singled out and disallowed, they must make the first effort to piss me off. If they go further and ignore robots.txt, I go further and ban by IP. At no time am I responsible for any escalation. If you think my "colorful" language is disturbing, I would say it is better for them to be able
    - Re:Big name != "real" (Score:2)
      
      by IainHere ( 536270 ) writes:
      
      For what it's worth (not much) I think your approach is perfect. And very funny. My favourite example for people who didn't follow the link:
      
      # Another bot that ignores * disallows, even though they claim they follow the protocol.
      # And what the hell is with Yahoo-VerticalCrawler-FormerWebCrawler in the agent? Pick a name!
      # This may be the same bot that was listed as FAST above, but it gets a special list.
      # Dirty, dirty bot. I kind of hope this is ignored so I get to block by IP.
      # Update: It is! I do!
      User
Blackhole them at the border routers (Score:3, Interesting)

by anticypher ( 48312 ) writes: <[moc.liamg] [ta] [rehpycitna]> on Thursday June 09, 2005 @09:21PM (#12776088) Homepage

There was a bunch of fsckwits called dir.com who had a real nasty spider crawling all over the place a few months ago. It blatantly ignored robots.txt, tried dictionary attacks to detect unlinked parts of the website, and may have been trying exploits to crack systems to discover secrets normally protected by passwords or logins. Honeypot email addresses fed to the spider would be spammed within days.

After too many complaints from clients about this nasty behaviour, a number of carriers started blackholing the prefixes of bad spiders at the border routers. Nice simple solution, and then you don't even see the spider traffic. Last I looked, about 20 major ISPs were blackholing prefixes of the worst spider/bot offenders.

Nobody would dare to blackhole google, but there are hundreds of google wannabe's and a few of them are unethical enough to get blocked. And then they wonder why they can't see 75% of the internet.

the AC

Share
twitter facebook
Simple Solution (Score:1)

by Refrozen ( 833543 ) writes:

Just block the bot from your site, or write some simple PHP to restrict it from querying the pages you want, and the frequency....

I'd just block the "bad-bots" though, if they don't listen to you, don't give them contact.

Or, contact the owner of the domain and get mad at them for spidering without following proper spider rules. He is wasting <b>your</b> resources in exchange for <b>their</b> profit, get mad, get even!
How to keep bad robots away (Score:3, Informative)

by stoborrobots ( 577882 ) writes: on Friday June 10, 2005 @04:15AM (#12778083)

http://www.fleiner.com/bots/ [fleiner.com]

I found this site through some slashdotter website long back... I've forgotten where and when, but it lends itself nicely to the topic...

Also good it the way arxiv.org fights back [arxiv.org].

Share
twitter facebook
- Known Bad Bots (Score:4, Informative)
  
  by stoborrobots ( 577882 ) writes: on Friday June 10, 2005 @04:34AM (#12778129)
  
  Oh, yeah, and to actually answer the OPs question, there are lists of known bad bots [kloth.net] out there...
  
  Parent Share
  twitter facebook
easy solution (Score:1)

by griasr ( 822487 ) writes:

i once had to deliver a solution for that problem to a friend. i made him a php script that detects the content directory and generates a javascript-website which links into the content directory with an encrypted javascript-link which cannot be used by spiders. the content directory is being renamed to some random name every hour. the error404 leads people to the entry-page, in case they surf the content dir while it is being renamed.
- - Re:Non-solution (Score:1)
    
    by griasr ( 822487 ) writes:
    
    thank you, i run a snowboardcompany named MORON... and NOFX wrote a song for my brother and me "MORON BROS"... extend my non-solution with a flash entry instead of javascript to be more secured from javascript-able bots. to be exact, it was a solution since he was very happy with it and bots stayed in fact away from his site. sometimes you got to deliver solutions you find gay yourself. i hate any client side stuff, but sometimes you gotta suck dick in order to get your $$$.
On a similar note... (Score:3, Interesting)

by Transcendent ( 204992 ) writes: on Sunday June 12, 2005 @12:32AM (#12792960)

What is with requests for http://xxx.slashdot.org/ok.txt [slashdot.org] coming through on my webserver as if someone (Slashdot if you trace the IP) is trying to use it as a proxy?

66.35.250.150 - - [29/Jan/2005:09:50:54 -0500] "GET http://it.slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 650 "-"
66.35.250.150 - - [31/Jan/2005:23:24:04 -0500] "GET http://slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 647 "-"
66.35.250.150 - - [04/Feb/2005:23:21:43 -0500] "GET http://slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 647 "-"
66.35.250.150 - - [08/Feb/2005:21:55:18 -0500] "GET http://it.slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 650 "-"
66.35.250.150 - - [11/Feb/2005:20:27:09 -0500] "GET http://slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 647 "-"
66.35.250.150 - - [21/Feb/2005:20:02:05 -0500] "GET http://games.slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 653 "-"
66.35.250.150 - - [02/Mar/2005:20:56:12 -0500] "GET http://it.slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 651 "-"
66.35.250.150 - - [08/Mar/2005:20:37:50 -0500] "GET http://slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 648 "-"
66.35.250.150 - - [12/Mar/2005:09:43:37 -0500] "GET http://yro.slashdot.org/ok.txt [slashdot.org] HTTP/1.0" 404 652 "-"
...(continues, of course)

I know the article is about bad spiders, but why is slashdot doing this?

Share
twitter facebook
- Re:On a similar note... (Score:4, Interesting)
  
  by afidel ( 530433 ) writes: on Sunday June 12, 2005 @04:26AM (#12793611)
  
  I asked rob and he said they check for DDoS's whenever someone try's to post anonymously from an address. I told him it was busted because no one posted anonymously from my IP, and furthermore it's bad netiquet to port scan someone just because they accessed your site. Don't think he cares.
  
  Parent Share
  twitter facebook
- Re:On a similar note... (Score:2)
  
  by mcbridematt ( 544099 ) writes:
  
  Slashdot does this to see if you are posting from a open proxy.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

zerg (Score:4, Interesting)

Re:zerg (Score:4, Informative)

Re:zerg (Score:4, Informative)

Got Zerg Source? (Score:3, Informative)

Re:Got Zerg Source? (Score:2, Interesting)

Re:Got Zerg Source? (Score:3, Interesting)

Re:Got Zerg Source? (Score:2)

Re:Got Zerg Source? (Score:1)

Re:zerg (Score:5, Interesting)

Re:zerg (Score:2)

Re:zerg (Score:2)

Re:zerg (Score:3, Informative)

Making them Pay (Score:4, Interesting)

Re:zerg (Score:3, Informative)

Re:zerg (Score:2)

Re:zerg (Score:2, Interesting)

Oh, they'll regret it if you try this. (Score:2)

Spammers are bad (of course) (Score:4, Insightful)

Re:Spammers are bad (of course) (Score:1)

Re:Spammers are bad (of course) (Score:2)

Re:Spammers are bad (of course) (Score:2, Funny)

Re:Spammers are bad (of course) (Score:3, Funny)

Re:Spammers are bad (of course) (Score:2)

Re:Spammers are bad (of course) (Score:2)

Re:Spammers are bad (of course) (Score:1)

Re:Spammers are bad (of course) (Score:1)

Re:Spammers are bad (of course) (Score:2)

Re:ok, let's get this out of the way... (Score:1)

Hey I've got an idea (Score:1)

I've got a better idea (Score:3, Insightful)

Re:Hey I've got an idea (Score:2, Insightful)

Re:Hey I've got an idea (Score:2)

Re:Hey I've got an idea (Score:4, Insightful)

Re:Hey I've got an idea (Score:1)

whitehouse.gov/robots.txt (Score:5, Interesting)

Re:whitehouse.gov/robots.txt (Score:1)

Re:whitehouse.gov/robots.txt (Score:2)

Here is your problem: (Score:5, Funny)

Re:Here is your problem: (Score:2)

Re:Here is your problem: (Score:2, Funny)

Re:Here is your problem: (Score:2)

Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:3, Interesting)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Re:Ideal solution... (Score:2)

Big name != "real" (Score:5, Informative)

Re:Big name != "real" (Score:2)

Re:Big name != "real" (Score:2)

Re:Big name != "real" (Score:2)

Blackhole them at the border routers (Score:3, Interesting)

Simple Solution (Score:1)

How to keep bad robots away (Score:3, Informative)

Known Bad Bots (Score:4, Informative)

easy solution (Score:1)

Re:Non-solution (Score:1)

On a similar note... (Score:3, Interesting)

Re:On a similar note... (Score:4, Interesting)

Re:On a similar note... (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals