Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Privacy Government The Courts News

Archiving Web Pages - Legal or Illegal? 102

Dyer asks: "I used to run several high-trafficked anonymous surfing sites and if I wasn't getting emailed by a lawyer telling me to block someone's site from being accessed I was being woken up at 2am with a telephone call from a crazy person yelling, sometimes swearing at me with the impression that my site copied theirs and it resided on my server, when in actuality it was being accessed by my server at that instant and being relayed to the user. This is my point, how do services like Archive.org and Google's cache get away with what they're doing? You can call their services whatever you like, but it doesn't change the fact that they are copying people's websites and saving them onto their servers for everyone to access."
This discussion has been archived. No new comments can be posted.

Archiving Web Pages - Legal or Illegal?

Comments Filter:
  • RTFF (Score:5, Informative)

    by kalidasa ( 577403 ) on Monday June 30, 2003 @04:04PM (#6333481) Journal

    Archive .org FAQ [archive.org]

    How can I remove my site's pages from the Wayback Machine?
    The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine.
    See our exclusion policy.
    You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email wayback2@archive.org.

    In other words, by your NOT including a robots.txt file, you are implicitly granting them permission to cache your content. Also, the content is cached as it was published, complete with the appropriate markings, and is only publicly accessible content, so you'd be hard press to argue there is any economic harm from the caching, which means there would be likely be no damages from a successful copyright suit, which means a copyright suit would be pretty damned unlikely.

    IANAL.

  • by stienman ( 51024 ) <adavis@@@ubasics...com> on Monday June 30, 2003 @04:10PM (#6333521) Homepage Journal
    It might be useful to note that the archive servers are located outside the US, and that they act on requests to have information and websites removed from their archive. (IIRC). I would state that the Archive serves a compelling public interest, both in the sense of free speech, and in the basic idea of keeping a history or record of the internet. The archive is a museum of sorts.

    Google, on the other hand, is gathering data for its search engine, and, of necessity, must have what essentially amounts to a copy of each web page in its stores in order to provide this service. If one does not want to have their data in Google, they simply use robots.txt, and Google doea not spider, cache, or store any data from that site if robots.txt is filled out. However, the site owner also denies themselves the ability to be listed, for 'free', in googles search pages. This could be thought of as the cost of being listed.

    So I don't think either of those two situations have any problems defending themselves. An anonymizer could also be seen as providing a useful, protected service. An anonymizer is nothing more than a proxy service, and many ISPs use proxies now, not to mention caches and many other tools that store website information or meta information without notifying or requesting explicit permission to do so - they request implicit permission by sending a GET command.

    -Adam
  • Honestly... (Score:2, Informative)

    by lptport1 ( 640159 ) on Monday June 30, 2003 @05:04PM (#6333936)
    This sounds sort of cynical to me, but it strikes me that the people who might be concerned about that don't comprehend the word "cache" and therefore never click on that link in the search results...

    Thus, never discovering that their site has been archived somewhere else. That, and Google has a rather chunky disclaimer-type-deal at the top--I'm sure it's in response to just that behaviour.
  • by simoniker ( 40 ) * <.simoniker. .at. .slashdot.org.> on Monday June 30, 2003 @05:42PM (#6334267) Homepage Journal
    Actually, the Internet Archive's main Wayback Machine [archive.org] servers are located in a co-location center in San Francisco, so it's not correct to say they're located outside the US. There is a mirror [bibalex.org] of the Archive's web content at the Library of Alexandria in Egypt, however - maybe that's what you're thinking of?

    In any case, the Archive's work with the Library Of Congress and, increasingly, national libraries who want to archive the Web content of their countries, proves that the establishment also thinks Web archiving is a vital thing to do for posterity. But the rights issues are definitely tricky.
  • Re:*copy* right (Score:3, Informative)

    by limekiller4 ( 451497 ) on Monday June 30, 2003 @05:45PM (#6334296) Homepage
    stanwirth writes:
    "...and Akamai's entire business model is based on illegal content-smuggling. I really don't think so!"

    Akamai caches sites of people who pay them to cache them, so that would be one hell of a lawsuit. I know this because I worked for them for a few years.
  • Re:*copy* right (Score:2, Informative)

    by anthony_dipierro ( 543308 ) on Monday June 30, 2003 @05:50PM (#6334326) Journal

    (FWIW, IANAL)

    Obviously [cornell.edu].

  • Re:*copy* right (Score:5, Informative)

    by SeanAhern ( 25764 ) on Monday June 30, 2003 @06:03PM (#6334426) Journal
    Mod parent up! This link to the US Code is very useful in this context.

    Heck, it's so useful that I'm going to quote some of it here:

    TITLE 17 > CHAPTER 5 > Sec. 512. Prev | Next

    Sec. 512. - Limitations on liability relating to material online

    (a) Transitory Digital Network Communications. -

    A service provider shall not be liable for monetary relief, or, except as provided in subsection (j), for injunctive or other equitable relief, for infringement of copyright by reason of the provider's transmitting, routing, or providing connections for, material through a system or network controlled or operated by or for the service provider, or by reason of the intermediate and transient storage of that material in the course of such transmitting, routing, or providing connections, if -

    (1)

    the transmission of the material was initiated by or at the direction of a person other than the service provider;

    (2)

    the transmission, routing, provision of connections, or storage is carried out through an automatic technical process without selection of the material by the service provider;

    (3)

    the service provider does not select the recipients of the material except as an automatic response to the request of another person;

    (4)

    no copy of the material made by the service provider in the course of such intermediate or transient storage is maintained on the system or network in a manner ordinarily accessible to anyone other than anticipated recipients, and no such copy is maintained on the system or network in a manner ordinarily accessible to such anticipated recipients for a longer period than is reasonably necessary for the transmission, routing, or provision of connections; and

    (5)

    the material is transmitted through the system or network without modification of its content.

    (b) System Caching. -

    (1) Limitation on liability. -

    A service provider shall not be liable for monetary relief, or, except as provided in subsection (j), for injunctive or other equitable relief, for infringement of copyright by reason of the intermediate and temporary storage of material on a system or network controlled or operated by or for the service provider in a case in which -

    (A)

    the material is made available online by a person other than the service provider;

    (B)

    the material is transmitted from the person described in subparagraph (A) through the system or network to a person other than the person described in subparagraph (A) at the direction of that other person; and

    (C)

    the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A),

    if the conditions set forth in paragraph (2) are met.

    (2) Conditions. -

    The conditions referred to in paragraph (1) are that -

    (A)

    the material described in paragraph (1) is transmitted to the subsequent users described in paragraph (1)(C) without modification to its content from the manner in which the material was transmitted from the person described in paragraph (1)(A);

    (B)

    the service provider described in paragraph (1) complies with rules concerning the refreshing, reloading, or other updating of the material when specified by the person making the material available online in accordance with a generally accepted industry standard data communications protocol for the system or network through which that person makes the material available, except that this subparagraph applies only if those rules are not used by the person described in paragraph (1)(A) to prevent or unreasonably impair the intermediate storage to which this subsection applies;

  • legality (Score:3, Informative)

    by sir_cello ( 634395 ) on Monday June 30, 2003 @06:18PM (#6334547)
    There are limited provisions in copyright law (at least in the UK, and I expect to occur elsewhere in the world) for public libraries and archives. But these are indeed limited provisions and do not apply to a random commercial organisation that decides to provide such a service.

    Firstly, in the general case of search engines providing indexing of content, this is legal and there are legal cases to back it up (in the UK: antiquesportfolio) so long as the indexes are not copies.

    Secondly, in the case of USENET groups and mailing lists, then in the process of submitting a message to the mailing list or group, you have given an implicit license for the message to be reproduced within the nature of the particular technology at hand. This means if at a later date you object to a message in a mailing list that you wrote in the past, you don't really have the ability to retract it. In all cases, anyone deciding to use the material in another way (e.g. creating a commercial CDROM of USENET material for a marked up price) would be violating your (and others) copyright. However, if they were providing that CDROM as a distribution service for USENET itself (e.g. "get your monthly USENET CDROM") then this is probably within the bounds of legality as it is still transfer via the USENET system, and the cost is likely to be that to reflect media/distribution costs rather than some specific aim to make a commercial product out of your material.

    Finally, in the specific case of copies of websites, yes this is a violation of copyright - but as far as I know this has not been tested in a court of law. The use of the Robots Exclusion Protocol and the NOARCHIVE, NOINDEX and NOFOLLOW elements allow a weasal argument suggesting that it is inherent in the WWW itself (as a new form of media / technology) that search engine indexing and archiving / caching is legal unless you specifically disallow it with this mechanism. It may also be the case that if this archiving / caching was carried out for profit or at price greater than fair for distribution/media then a party is making an economic gain out of your material and this suggests an inequitable violation of your economic rights.

    Another point to remember is that in WTO treaties that resulted in DMCA provisions, as enacted in the UK and EU, there are specific fair use allowances for intermediate copies of a copyright work as necessary for the telecommunications medium itself (this would seem to allow things like store-and-forward systems, and caching).

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...