Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Privacy Government The Courts News

Archiving Web Pages - Legal or Illegal? 102

Dyer asks: "I used to run several high-trafficked anonymous surfing sites and if I wasn't getting emailed by a lawyer telling me to block someone's site from being accessed I was being woken up at 2am with a telephone call from a crazy person yelling, sometimes swearing at me with the impression that my site copied theirs and it resided on my server, when in actuality it was being accessed by my server at that instant and being relayed to the user. This is my point, how do services like Archive.org and Google's cache get away with what they're doing? You can call their services whatever you like, but it doesn't change the fact that they are copying people's websites and saving them onto their servers for everyone to access."
This discussion has been archived. No new comments can be posted.

Archiving Web Pages - Legal or Illegal?

Comments Filter:
  • It SHOULD be legal (Score:4, Interesting)

    by Anonymous Coward on Monday June 30, 2003 @04:00PM (#6333451)
    Well, it should be legal/allowed. If you don't want it read and archived, don't put it on the Web.

    Everything should go, except for things like malicious alteration and theft (taking stuff and claiming it is yours)
  • My 9/11 Archive (Score:5, Interesting)

    by limekiller4 ( 451497 ) on Monday June 30, 2003 @04:05PM (#6333487) Homepage
    On the day of 9/11, I began to think that maybe a lot of things would be online that would disappear on the next update, forever. We tend to think of 1880 newspaper clippings as being perishable, not online media, but the opposite is true. So all day on 9/11 I archived news sites and about two hundred blogs using "wget -p".

    Over the next week I archived some 4,600 blogs. They've kind of been sitting around waiting for me to weed through and organize. I've also been wgetting 30 or so large news sites' front page every 15 minutes or so on the hunch that I'll grab something emerging even if I'm AFK. Well ...what can I do with this data?

    The answer(s) to this question will definitely be of use to me. Thanks for asking it. Slash, thanks for posting it.
  • by lightspawn ( 155347 ) on Monday June 30, 2003 @04:39PM (#6333757) Homepage
    Well, it should be legal/allowed. If you don't want it read and archived, don't put it on the Web.

    You know, I've been wondering about Java/Shockwave games. Certainly most kids would love a CD full of those games, and many companies have many different games online which mostly disappear a few months later.

    Is anybody archiving these? Do we need to start?

    Would the companies object?

    You can play The Hitchhiker's Guide to the Galaxy [douglasadams.com] on Douglas Adams' web site. As it happens, if you know what you're doing you can also download the .z5 file and play it offline on any zip interpreter. Would the copyright owners object to it? I own that Infocom 33-game collection and all 5 books; the reason the game wasn't included in the collection is copyright hassles. Am I "entitled" to play it offline?

    This ties in to today's "is ROM collecting wrong" story, except in this case you're actually offered the games, under mostly unclear terms.

  • *copy* right (Score:5, Interesting)

    by ccady ( 569355 ) on Monday June 30, 2003 @05:06PM (#6333970) Journal

    (FWIW, IANAL) Web site content is copyrighted. Therefore, you have a right to make your own personal copy, and backup copies, but it is not legal to redistribute those copies without the site owner's permission. I cannot imagine that the Wayback machine or the Google cache is legal. They are blatantly disregarding the site owners' copyright.

    That said, I think the law should be changed or at least clarified, because it is patently (pun intended) obvious that those services are doing a vast social good, and should be encouraged.

  • Re:*copy* right (Score:3, Interesting)

    by stanwirth ( 621074 ) on Monday June 30, 2003 @05:19PM (#6334113)

    Web site content is copyrighted. Therefore, you have a right to make your own personal copy, and backup copies, but it is not legal to redistribute those copies without the site owner's permission. I cannot imagine that the Wayback machine or the Google cache is legal. They are blatantly disregarding the site owners' copyright.

    That would imply that every ISP running a public squid cache is breaking the law, and Akamai's entire business model is based on illegal content-smuggling. I really don't think so!

  • by Anonymous Brave Guy ( 457657 ) on Monday June 30, 2003 @06:40PM (#6334715)
    In other words, by your NOT including a robots.txt file, you are implicitly granting them permission to cache your content.

    Riiiiight. See you in court.

    As I've just posted elsewhere, it is quite feasible that a site owner could be damaged if caches maintain information after the original site has been changed or taken down. For example, if updated information is placed on the original, this leaves the "cached" versions out of date and misleading anyone who reads them thinking they're seeing a perfect copy of the real thing.

    There is also the issue of a site owner's right to know who is visiting them. Many popular web sites can and do collect information about how visitors move around their sites, the browsers and resolutions they use, etc. If the information on the site is being offered according to the normal conventions of the Internet, it is only fair to provide them the feedback normally returned by the conventions of the Internet. This information is valuable to them when they come to revise the site. Ultimately it is also in the site visitors' best interests for the site owner to have accurate information available, so that if they want to make the effort to improve usability, support minority browsers that some of their visitors use or whatever, they can do so.

    On a related note, there are questions of advertising revenue etc. if a site is supported by sponsors who pay per-hit. It's not at all guaranteed that they will get their fair amount of sponsorship if most of those hits are seeing a web cached version.

    This whole issue isn't nearly as black and white as the "information should be free" crowd are inevitably shouting already.

  • by lpq ( 583377 ) on Tuesday July 01, 2003 @02:46AM (#6337368) Homepage Journal
    Some people are arguing robots.txt as the determiner, however remember
    the court case that a company *lost* because it copied the data of a
    competitor site and set it's prices lower.

    This is equivalent to Kroger hiring a few clerks to go down each day and
    take prices of various objects on their wifi equip'ed phones/handhelds in
    a store so Safeway can under cut prices.

    What, you didn't read the fine print on the Safeway door that says no price
    comparisons or making up price lists? Or what...were they supposed to look
    for a robots.txt file behind the Safeway door?

    There seems to be a general lack of common sense here (especially on the
    part of the judge that ruled against the company scanning for competing
    prices). If it is allowed in the real world, it shouldn't be different in
    the computer world without alot of sound reasoning behind why it should be
    different. The fact that Safeway could have a 3-page acceptable use policy
    that I accept when my body presence opens the door, is ludicrous.

    Now you talk about advertising losses -- what about whatever major network
    it was, deleting competing major network logo bought and paid for on
    tall building in Times Square for New Years eve? Competing networked modified
    the image in realtime and inserted their own logo for the price of an SGI
    workstation -- heck of alot cheaper. Legal? Not legal? Can you say a
    real life image is "copyright" and if two people take a picture of the same
    real life picture, is one the rightful owner? What if one or both alter
    the "real life picture", have they violated someone's rights? Reality's
    rights (ok, in this case it would have been the network that paids to rent the
    entire side of the building), but it's really a matter of who owns what you
    see? If a picture is take of what you see, who owns the picture?

    This is a complete mishmash of conflicting legal decisions with computer
    copying, caching, alteration and adding to the mess. What if I load a page
    but I don't load the images? Have I violated copyright because I either
    chose or cannot load the images? What if I selectively blocked them based
    on their IP or name? If I don't load flash player, am I violating a
    copyright on a site by not viewing the flash content advertising?

    Random judges in random jurisdictions are going to be making random calls on
    right/wrong that will collide with each other and with what makes sense in
    the real world.

    I'm not sure what the collective approach should be -- should I be required to
    watch TV advertising or am I stealing programming if I go to the loo during
    a panty spot? If I block popup am I stealing computer time.....

    This is all just one big gigantic growing mass of living worms that promises to be one of the larger headaches of times to come.

    Any unified field theories to solve this mess? :-)

  • Re:RTFF (Score:3, Interesting)

    by ScuzzMonkey ( 208981 ) on Tuesday July 01, 2003 @10:52AM (#6339138) Homepage
    In this case, the first remedy is provided by the potential violator...

    Yes, but it places the burden in the wrong place and so is not likely to be considered an adequate remedy by the courts. More properly, the violator should be seeking permission prior to re-distributing the content, rather than essentially saying to the copyright holder "Stop me before I copy again!"

    I'm not sure I think that caching sites should be subject to traditional copyright law--it has some nasty implications for anyone who cuts traffic loads using a proxy server (insert humorous image of AOL Time Warner suing themselves for caching their own content)and really strikes me as yet another area where technology outstrips law, but if they are subject to it, their chosen remedy isn't likely to hold much water.
  • by Anonymous Brave Guy ( 457657 ) on Wednesday July 02, 2003 @08:06AM (#6348154)

    Damaged in what way? Aren't there archives of newspapers, journals, and magazines? And if time-sensitive information is present on a website, does the public have a right to see what was previously there?

    If I put up information on a web site, for free, as a volunteer, then the public has no rights whatsoever, either legally or morally. Why the hell should they? They didn't do anything to earn them.

    If you have a specific example related to this problem, I would love to hear it.

    I'll give you a couple of examples where real damage can be done. There are certainly several other instances, but I hope these will suffice for now.

    There have been cases where someone published some material on a subject that interested them on a web site, but later wanted to publish work based on it in something like a journal or a book. (Disclosure: I am currently in a similar position myself.)

    Now, publishers get very nervous about publishing material that has previously been available in another form. If you're arguing that by putting it up on the web an author effectively forfeits all rights to control their work -- i.e., that the usual principles of copyright shouldn't apply for some reason in this medium -- then you're basically saying that anyone who might ever want to publish original material they wrote shouldn't ever make anything available on the web first. Given how much both the public and the author can potentially get out of that, provided that reasonable controls are in place -- there was a Slashdot story about a new programming book citing a preprint temporarily placed on the web just a few days ago -- this seems to be needlessly counterproductive to me.

    Secondly, a bit closer to home, consider a company that has a critical story about it published on Slashdot. That company is likely to get a lot of traffic to its web site if the site is linked, and might well want to put up a rebuttal of any points made against it. It's only fair that visitors who go to check out the Slashdot story also see the company's response.

    Now, we all know that Slashdot articles have seriously criticised businesses in the past, sometimes with justification, sometimes without. We all know that web sites get Slashdotted. We all know that people post links here to Google caches of sites, or just copy whole pages and post them here. In this sort of case, someone could suffer serious harm to their reputation because the audience of Slashdot only get to read things supporting a critical claim, without seeing (or even being aware of) a response from the criticised party in their defence.

    Nicking someone's material and posting it here is blatant copyright infringement, and just because it's done by an AC and Slashdot claims that all posts are the responsibility of their authors doesn't necessarily make it legal. It amazes me, given a few of the things that get posted around here, that no-one has ever really attempted to sue Slashdot over this. Certainly things like circumventing the NYT's "free reg required" are very dicey, and given that everyone (including those running Slashdot) knows that it happens, I don't see how they'd have much of a defence.

    In my personal opinion, and looking at the actual US law that's been quoted here, it seems that web sites caching material are also likely to be in breach of copyright laws for much the same reasons, doing much the same damage in some cases, and potentially subject to much the same penalties.

    Right - just like WalMart has the right to pat down and run a credit check on everyone who walks through their doors.

    No, it doesn't. But it has the right to refuse entry to anyone who doesn't provide the information it requires. Banks do this if you try to enter before removing your crash helmet. Bars do it if you look under-age and can't produce ID.

    While a site admin might like to know everythin

If you have a procedure with 10 parameters, you probably missed some.

Working...