Stories
Slash Boxes
Comments

News for nerds, stuff that matters

The Wayback Machine, Friend or Foe?

Posted by Cliff on Wed Jun 19, 2002 04:34 PM
from the giving-google's-cache-a-run-for-its-money dept.
ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?

"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"

This discussion has been archived. No new comments can be posted.
The Wayback Machine, Friend or Foe? | Log In/Create an Account | Top | 508 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2
  • Erm (Score:3, Insightful)

    by adamwright (536224) on Wednesday June 19 2002, @04:37PM (#3732271) Homepage
    Isn't this exactly the point of robots.txt? Google won't cache content it doesn't spider, and it won't spider content forbidden by your robots.txt. Does the WayBack Machine obey the robots rules?
    • Re:Erm by JebusIsLord (Score:2) Wednesday June 19 2002, @04:48PM
      • Re:Erm by JebusIsLord (Score:1) Wednesday June 19 2002, @04:52PM
        • Re:Erm by JebusIsLord (Score:2) Wednesday June 19 2002, @04:56PM
      • 1 reply beneath your current threshold.
    • Re:Erm by 1g$man (Score:2) Wednesday June 19 2002, @05:05PM
      • Re:Erm (Score:5, Insightful)

        by kevin@ank.com (87560) on Wednesday June 19 2002, @05:34PM (#3732672) Homepage
        The goal of the person who started archive.org was to record the history of the world wide web. The assumption was that whatever anyone thinks about the archive, there will never be another chance to go back and get that data once it is lost.

        The copies that they have archived in their databases are individual copies served from the original web requests, so they have the right to keep them. They became their copy when they were originally downloaded. Whether they have the right to make new copies and redistribute them depends on how you think fair use applies to that content.

        Ultimately if a lot of people start suing them they will probably shut down the archive to public access and only allow researchers to view their original copies on site. And if you'd prefer that, well, you'll end up with the world you deserve.

        [ Parent ]
      • Re:Erm by amRadioHed (Score:1) Wednesday June 19 2002, @05:34PM
        • Re:Erm by dan the person (Score:1) Wednesday June 19 2002, @08:05PM
      • Re:Erm (Score:5, Funny)

        by Ross C. Brackett (5878) on Wednesday June 19 2002, @05:42PM (#3732710) Homepage
        Well, the default is to not plug your server into the Internet the first place, now isn't it? To quote Doug from Ghost World, [imdb.com] "It's America, dude, learn the rules."

        Seriously, if someone's precious intellectual property - as if anything worthwhile was ever posted on the Internet in the first place - becomes compromised because they don't know a basic principle of how to run a website, well then boo hoo.

        It's worth the tradeoff. That the Wayback Machine exists is seriously cool, and some day will be of definite historical worth. If the occasional Brady Bunch erotic slash fiction author has to take a ride on the waaahmbulance because "A Very Brady Gangbang (M/m/F/f nc b/d)" got copied without their permission for the greater historical good, then that's a price worth paying.
        [ Parent ]
        • Re:Erm by M-G (Score:2) Wednesday June 19 2002, @06:09PM
        • Re:Erm by Wyatt Earp (Score:1) Wednesday June 19 2002, @07:42PM
        • Re:Erm by DaveOMatic (Score:1) Wednesday June 19 2002, @08:22PM
        • Re:Erm by julesh (Score:1) Thursday June 20 2002, @06:51AM
        • Re:Erm by jbarr (Score:1) Thursday June 20 2002, @08:58AM
      • Re:Erm by treat (Score:2) Wednesday June 19 2002, @06:03PM
      • Errr...i disagree by Archfeld (Score:2) Wednesday June 19 2002, @06:34PM
    • Re:Erm by zootread (Score:2) Wednesday June 19 2002, @05:31PM
      • Re:Erm by WeedMonkey (Score:1) Thursday June 20 2002, @07:33AM
      • archive this! by zootread (Score:1) Thursday June 20 2002, @11:16AM
      • 1 reply beneath your current threshold.
    • Re:Erm (Score:5, Informative)

      by dswensen (252552) on Wednesday June 19 2002, @05:59PM (#3732837) Homepage Journal
      Yes it does, and how. In fact, immediately upon reading this story, I went to the Wayback Machine and checked out my personal website archive. There it was, material dating back to 1996 ("Oh God, no, not the digging man GIF!"). I made a new robots.txt file:

      User-agent: *
      Disallow: /
      # BITE ME WAYBACK MACHINE

      ... uploaded it, went back to the Wayback Machine, and got:

      Robots.txt Query Exclusion.

      We're sorry, access to [site] has been blocked by the site owner via robots.txt.
      Read more about robots.txt
      See the site's robots.txt file.
      Try another request or click here to search for all pages on [site]

      So, yeah, they seem to check the site for the most current robots.txt file before they show the archive. And if the robots.txt disallows archiving the site, ALL the entries are marked unavailable, not just the current ones.

      So, it's pretty easy to solve the problem of the Wayback Machine -- and probably without going balls-out with the "disallow everything everywhere" like I did.
      [ Parent ]
      • Re:Erm by guttentag (Score:3) Thursday June 20 2002, @01:56AM
      • Re:Erm by nanojath (Score:1) Thursday June 20 2002, @09:57AM
    • Re:Erm by g_attrill (Score:1) Wednesday June 19 2002, @06:23PM
    • Re:Erm by HD Webdev (Score:1) Wednesday June 19 2002, @06:23PM
    • Re:Erm by uncoveror (Score:2) Wednesday June 19 2002, @09:11PM
    • I hope that one day the net credit card by Rareul (Score:2) Thursday June 20 2002, @12:15AM
    • Re:Erm by HP LoveJet (Score:2) Wednesday June 19 2002, @04:57PM
    • Re:Erm by MushMouth (Score:1) Wednesday June 19 2002, @05:16PM
    • Re:DAVE WINER - Why he's been missing by SnappingFish (Score:1) Wednesday June 19 2002, @10:42PM
    • Re:Erm by imperator_mundi (Score:1) Thursday June 20 2002, @02:54AM
    • 5 replies beneath your current threshold.
  • Yummy by sheepab (Score:2) Wednesday June 19 2002, @04:38PM
    • Re:Yummy by quintessent (Score:2) Wednesday June 19 2002, @04:51PM
    • Re:Yummy by mongoks (Score:1) Wednesday June 19 2002, @05:01PM
    • Re:Yummy by digitalsushi (Score:2) Wednesday June 19 2002, @05:34PM
    • Re:Yummy by egreB (Score:1) Thursday June 20 2002, @05:38AM
    • 2 replies beneath your current threshold.
  • "The Wayback Machine" (Score:3, Informative)

    by pb (1020) on Wednesday June 19 2002, @04:40PM (#3732287)
    "The Wayback Machine" has been a pet project for a long time, and we're only now seeing results. I know for a fact that they have pages back at least as far as 1996, and it's a damn shame they don't have anything that much earlier...

    And yes, it obeys the Robot Exclusion Principle.

    "Ask Google" strikes again; I would hope that you could find all of this information by searching, or reading an "About" page, or something. Fortunately, these abortions to journalism don't appear on the Front Page very often.
  • Robots.txt (Score:5, Informative)

    by mshowman (542844) on Wednesday June 19 2002, @04:41PM (#3732291)
    I had recently placed a restricted robots.txt file on my site and when trying to access any of the past revisions, I get a message saying that the owner has restricted access to the site via robots.txt. They seem to have that aspect under control.
    • Re:Robots.txt by Dwedit (Score:1) Wednesday June 19 2002, @05:01PM
    • Re:Robots.txt by spacefight (Score:1) Wednesday June 19 2002, @06:14PM
    • Re:Robots.txt by MulluskO (Score:1) Wednesday June 19 2002, @09:29PM
    • Re:Robots.txt by ShaunC (Score:3) Thursday June 20 2002, @01:16PM
  • by Anonymous Coward on Wednesday June 19 2002, @04:41PM (#3732294)
    It's a scary thought that things kids are saying on message boards when they're teenagers are going to be back to haunt them when they apply for jobs in their mid 40s...

    I mean, if everything I posted on BBSes in the 1980s were still attributable to me... yikes.

    Remember kids. Use a nickname, and change it frequently if you ever want to run for any kind of office.

    • by TheMonkeyDepartment (413269) on Wednesday June 19 2002, @04:44PM (#3732323)
      Well, that's a great point, and it's a good illustration of the double-edged sword of free speech. You are free to say whatever dumbshit, ridiculous things you want. But you are also free to deal with the social consequences.
      [ Parent ]
    • by rhaig (24891) <rhaig@acm.org> on Wednesday June 19 2002, @05:19PM (#3732591) Homepage
      dejanews was my best tool to weed out resumes

      before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
      [ Parent ]
      • Re:There are more than copyright concerns... by DrMaurer (Score:1) Wednesday June 19 2002, @09:45PM
      • by madmancarman (100642) on Wednesday June 19 2002, @10:38PM (#3733974)
        dejanews was my best tool to weed out resumes

        before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.

        This kind of freaked me out when I started teaching in 1998 - I'd been running a large fan web site devoted to one of my favorite bands, and being heavily into the band, I posted a lot in their newsgroup and participated in more than one flame war. Of course, I was in college and in my very early 20's and late teens, but it's all archived on DejaNews now, with no way to remove it. I really doubt any public school districts are going to wise up to this (or even care, considering the national teacher shortage), but I wouldn't be surprised if it came back to haunt me in some way some day. As a previous poster mentioned, such is the burden of free speech.

        An interesting thing did happen to me at the beginning of this school year. I teach high school computer classes, and I was talking about managing that fan web site when one of my students (a junior) opened his eyes really big and pointed at me with his jaw dropped, sort of aghast. I paused and asked him what was wrong, and he exclaimed that he downloaded and used the guitar tabs I'd written years earlier when he was in junior high. I found that kind of amusing!

        I think the archiving of the internet is particularly scary when I can still find a lousy guitar tab I did of Pearl Jam's "Footsteps" [guitaretab.com] that I did back in 1992, when I was a senior in high school piggybacking off an account at the nearby university, on my parents' Apple //e, while I was still learning how to play guitar. Obviously, the internet can have a much longer shelf life than a ProDOS 5.25" floppy (excluding news sites [msnbc.com] that "expire" their articles after limited availability).

        First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi

        [ Parent ]
      • 2 replies beneath your current threshold.
    • Re:There are more than copyright concerns... by gad_zuki! (Score:2) Wednesday June 19 2002, @05:29PM
    • Re:There are more than copyright concerns... by Suppafly (Score:2) Wednesday June 19 2002, @06:54PM
    • Re:There are more than copyright concerns... by Sloppy (Score:1) Wednesday June 19 2002, @07:03PM
    • Re:There are more than copyright concerns... by Shalda (Score:1) Wednesday June 19 2002, @08:28PM
    • Re:There are more than copyright concerns... by guttentag (Score:2) Thursday June 20 2002, @02:08AM
    • Re:There are more than copyright concerns... by jolshefsky (Score:1) Thursday June 20 2002, @07:21AM
    • 5 replies beneath your current threshold.
  • by TheMonkeyDepartment (413269) on Wednesday June 19 2002, @04:41PM (#3732296)
    When you publish something on the web, it is publicly available via HTTP. End of story. Responsible netizens can observe the requests of "robots.txt" but they don't have to. If you want something more controlled, create a VPN or intranet or some other kind of non-public data server.

    Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.

  • Talk about a time machine... (Score:3, Interesting)

    by wompser (165008) on Wednesday June 19 2002, @04:42PM (#3732300)
    Went back and looked at the site for the .com I used to work for, very nostalgic. The wayback machine is a good resource for people who create content on someone's site (a.k.a. me), and then lose access to it because the company goes under. Now I'm able to add my old content to my portfolio, now that the company who once owned it is gone.
  • Simple rule by npsimons (Score:1) Wednesday June 19 2002, @04:42PM
  • Permission... (Score:3, Insightful)

    by gorf (182301) on Wednesday June 19 2002, @04:42PM (#3732307)

    who gave them permission to make those copies?

    The way I see it, you implicitly give people some limited form of permission by putting it up on the internet freely available to download in the first place. You put it up for people to download, print out and so forth (which amounts to copying), and therefore you've implied that people may do so.

    Sure, you own copyright, and blatant plagarism is something that clearly is wrong. But I see nothing wrong with taking an article that you published on the web and reproducing it, as long as it is taken in context and is clearly attributed (and it made obvious that the copy isn't the original, but proper attribution would do this and therefore suffice).

    Of course, this is republication and so the issue is not so clear and obviously subjective. That's just my opinion.

  • Friend or Foe? Hmmmm... by Navaash Fenwylde (Score:1) Wednesday June 19 2002, @04:43PM
  • Legally you can stop them, but why? (Score:3, Informative)

    by the_womble (580291) on Wednesday June 19 2002, @04:43PM (#3732310) Homepage Journal
    If you own the copyright they can not archive it without your permsiission, legally, that is all there is to it.

    Of course in practice you have to purse this and ask them to remove it.

    If you really object I suggest a list of every site you have or have had and dates with a request to remove everything. Then you only need to notify them when you put up a new site that that whould also be excluded. That would not be such a nuisance, would it?

    That said I think they are providing a service that is interesting so unless you are harmed by it, why object?

    I am interested in knowing how they had such old versions of your site though. Do search engines keep archives?

  • The story should read 'since 1996' by forged (Score:2) Wednesday June 19 2002, @04:43PM
  • As an creator... by Bonker (Score:2) Wednesday June 19 2002, @04:43PM
  • Ah, Gee! by Dark Paladin (Score:2) Wednesday June 19 2002, @04:43PM
    • 1 reply beneath your current threshold.
  • Even better.... by sheepab (Score:1) Wednesday June 19 2002, @04:44PM
  • Do I have permission to copy the content of your site to my browser history directory, and if so, how long do I have permission to keep it? Can I show a copy of an html document that is stored in my browser history to my mother? What about my neighbor? Or the dude in another country I happen to be chatting with online?

    IANAL blah blah blah, but once you open your files up to being downloaded and stored by a browser, you've pretty much given up the right to tell people they can't be re-distributed--I would think the best you could hope for is that people would re-distribute them, in whole, the way you originally released them.

  • I like it but... (Score:4, Insightful)

    by rknop (240417) on Wednesday June 19 2002, @04:44PM (#3732330) Homepage

    When I first discovered it, it was a lot of fun. Much nostalgia; it was fun seeing earlier verisons of my webpages. Some go back quite a number of years.

    On the other hand, I was horrified when I realized that there was full archiving of www.dramex.org. If you visit that site, you will see that there are a large number of scripts (as in plays), many of which have restrictions on use. Over the years, we've had people request that scripts be removed from the site; of course, we did so. However, they weren't necessarily removed from the archive, and an archive keeps them forever. Specifically with the wayback machine, I was able to submit stuff that removed the specific directories I was worried about (they don't archive the scripts from www.dramex.org, just the "front page" stuff which is all part of the fun), and keep them from doing it again.

    I like the idea of archives; it preserves history. The web is a transient medium, but not entirely. Yes, much of the content is dynamic and should only be dynamic. Some of it, though, is like the front page of a newspaper. Each day, what's on "today's front page" is different-- but there is value and use in seeing what was on the front page in any day in history.

    But sometimes you need to delete something and make sure it really is no longer available. When you don't completely control your site (i.e. somebody else archives it, rather than just mirrors it), that becomes impossible.

    newspaper.

    (Incremental backups can have a similar issue. If you only back up files which are "newer than the last backup", your backup doesn't have the information about files which have been *deleted* since the last backup. When you restore, you might find some files there you thought shouldn't exist any more.)

    (Dramex.org has changed so that it's not straightforward to get directly to the scripts any more. META tags tell the search engines to leave the actual scripts alone, and you can only get the text itself via CGI. Yes, it's easy to subvert if you put your mind to it, but at least you do have to put your mind to it, and automated search engines or archivers won't. 90% of the security for 1% of the effort.)

    -Rob

    • 1 reply beneath your current threshold.
  • its a good thing... by negativethirsty (Score:1) Wednesday June 19 2002, @04:45PM
  • I love it. (Score:3, Informative)

    by gripdamage (529664) on Wednesday June 19 2002, @04:45PM (#3732334)

    What's the problem?

    If you do something illegal on your website, you won't be held responsible more than once just because the data persists on the Wayback machine. If you remove the offensive material from your site, that's all you can do. The Wayback machine can deal with their own lawsuit threats. And I'm sure they'll remove material if you are the site owner and ask nicely.

    As far as outdated information, anyone reading pages on the wayback machine and expecting them to be current would have to be crazy. It's an archive after all.

    It's easy to opt out. Google provides instructions in there webmaster faq [google.com] which points out "There is a standard for robot exclusion at http://www.robotstxt.org/wc/norobots.html [robotstxt.org]."

  • As a webmaster of various sites... (Score:5, Insightful)

    by schon (31600) on Wednesday June 19 2002, @04:45PM (#3732338) Homepage
    As a webmaster of various sites, I have no problem with archives.. if I didn't want people to see my stuff, I wouldn't have put it on the internet in the first place.

    where did they get such old copies of my websites, and who gave them permission to make those copies?

    They probably got the copies the same way everybody else did - by surfing. You (implicitly) gave them permission to cache your sites by not including an appropriate entry in your robots.txt.

    The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out?

    Archives are nothing like spam. Spam is primarily harrassment. These guys aren't harrassing you. They did ask your permission (by way of checking your robots.txt). If you've since changed your mind, it's your responsibility to notify them.

    Google caches material too - do you consider them to be spam as well?

    Archive sites provide a valuable resource to the rest of the 'net. If you don't like it, put an appropriate entry in your robots.txt file, and be done with it.
  • Can libraries keep old newspapers? by cperciva (Score:2) Wednesday June 19 2002, @04:45PM
  • Quit simply, without Google ... by Vicegrip (Score:2) Wednesday June 19 2002, @04:45PM
  • I doubt that I'm alone in my belief that it is always tragic when any piece of information--no matter how trivial--is lost forever.

    If a person has offered that information for free at any point, to the extent that an automated script could access it, then I believe that information can be safely considered public domain. I doubt that there's any mechanism by which Richard M. Stallman could lose his mind and "rein in" all copies of GNU, or by which Stephen King could recall all his novels and refund the purchase price; once something is offered to the public, it no longer belongs exclusively to the publisher.

    In my opinion, the value of archives in the future immeasurably outweighs occasional inconveniences of having information stick around longer than the author would have wished.
  • It has its uses. by Helmholtz Coil (Score:1) Wednesday June 19 2002, @04:46PM
  • err okay... by NanoGator (Score:2) Wednesday June 19 2002, @04:46PM
  • How so? by SkyLeach (Score:2) Wednesday June 19 2002, @04:46PM
  • Excellent idea by synthox (Score:1) Wednesday June 19 2002, @04:47PM
  • Fork over your caches (Score:3, Funny)

    by Eponymous, Showered (73818) <jase&dufair,org> on Wednesday June 19 2002, @04:47PM (#3732358) Homepage
    I browsed your all of your sites (even the abandoned ones) and since my browser cache is set to 782TB (and I'm still running Netscape 1.0N), your sites are still there. And my cache is publically accessible via my webserver. Yet another way you're being violated. Ah, the risks and perils of publishing on a public network.
  • Archives need to be made (Score:4, Insightful)

    by Waffle Iron (339739) on Wednesday June 19 2002, @04:48PM (#3732366)
    If the courts determine that it is technically illegal to make archives of electronic content, then the copyright laws should be changed to explicitly allow archiving. Otherwise, we could eventually lose track of history. The only written record of large portions of our civilization would be relegated to a few rusting web server hard drives buried landfills.

    If you read 1984, you might remember that the government tightly controlled all old copies of documents so that they could manipulate history as they wished. We might get into a similar situation by accident if we don't allow independent archives of electronic information.

    With traditional media, you publish something on paper, but you don't get to control who puts the paper copies in which archives. That has served us well for keeping track of history, and an equivalent system needs to maintained for electronic content.

  • A Real World Example/Question by GeekLife.com (Score:2) Wednesday June 19 2002, @04:48PM
  • And what to do when info must die? by Nf1nk (Score:2) Wednesday June 19 2002, @04:49PM
  • it's even... by gabvalois (Score:1) Wednesday June 19 2002, @04:50PM
  • Friend to Hosting Comapnies (Score:5, Funny)

    by Da J Rob (469571) on Wednesday June 19 2002, @04:50PM (#3732387) Homepage
    I was talking to this guy who works for a web hosting company [hostirian.com], and he says a fourth of his sales calls are people calling him up cause they're pissed that their last hosting company 'lost' thier site. (in reality most the time its later found out that the guy deleted it himself or renamed index.html to index2.html, etc..) He says 90% of the sites he can find a copy on the wayback machine. He'll then start to quote the website's contents to the guy on the phone and usually will have the amazed (and dumbfounded) customer signing a hosting contract by the end of day.
  • caching proxy servers by bigpat (Score:1) Wednesday June 19 2002, @04:51PM
    • 1 reply beneath your current threshold.
  • Uh, robots.txt! by Tom7 (Score:2) Wednesday June 19 2002, @04:54PM
    • 1 reply beneath your current threshold.
  • it's a good thing by red_five_standing_by (Score:1) Wednesday June 19 2002, @04:55PM
  • Euro friendly :) by Anonymous Coward (Score:1) Wednesday June 19 2002, @04:56PM
  • robots.txt DUUUUUUUUHHHHHHH!!!!!!!! by jsimon12 (Score:2) Wednesday June 19 2002, @04:56PM
  • Copyright and websites. (Score:3, Interesting)

    by www.sorehands.com (142825) on Wednesday June 19 2002, @04:57PM (#3732449) Homepage
    It could be argued that the site is publically available and thus anyone can copy it. There is also the issue of fair use. That is why many people place terms of use and robots.txt files on their sites. It could even be a DMCA violation where an IP (or range) has been blocked, so people from that IP use the google cache to bypass the block.


    I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.


    There is case law where unauthorized access to a website is a copyright violation.


    I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).

  • Get Used to It, please by pyrrho (Score:2) Wednesday June 19 2002, @04:57PM
  • TV Broadcast analogy (Score:4, Interesting)

    by rknop (240417) on Wednesday June 19 2002, @04:58PM (#3732456) Homepage

    Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.

    You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)

    If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.

    Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.

    I don't have a good answer to the questions. Just thought.

    -Rob

  • best thing since sliced bread by John Sokol (Score:2) Wednesday June 19 2002, @05:00PM
  • p2p by mephist01 (Score:1) Wednesday June 19 2002, @05:01PM
  • Wayback machine = free backups! by FamousLongAgo (Score:1) Wednesday June 19 2002, @05:01PM
  • by tiltowait (306189) on Wednesday June 19 2002, @05:02PM (#3732486) Homepage Journal
    .... and wayback is sponsored, amongst others, by the library of congress. The archive itself a 501(c)(3) public nonprofit. See 17 U.S.C. SECTION 108(a)(3) [cornell.edu] for more information.

    Strange that such a complaint would appear within a group expousing that "information wants to be free." :)
  • For what it's worth... by Reality Master 101 (Score:2) Wednesday June 19 2002, @05:03PM
  • Purist? Pure what? (Score:5, Insightful)

    by American AC in Paris (230456) on Wednesday June 19 2002, @05:03PM (#3732492) Homepage
    Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews.

    I'd say it makes you more of a control freak than a purist, personally.

    Seriously, how did you ever get it into your head that a medium that serves documents to the general public on demand would be somehow exempt from archiving?

    Would it bother you of John Q. Savant could recite the contents of your web pages from memory ten years after you'd taken it down?

    Would it bother you to learn that stock prices, perhaps the most "ever-changing" thing out there, are permanently archived by a variety of services?

    Or are you just jittery at the thought that your spouse/boss/Friendly Neighborhood Representative of The Man/kids may be able to someday look at the shite you plastered all over the web in your younger days? ("Ech, that stupid Netscape 2 animated title hack--honey, you actually -did- that?")

  • Definately foe by brandonsr (Score:1) Wednesday June 19 2002, @05:06PM
  • Denmark solved that problem by law by Jezral (Score:1) Wednesday June 19 2002, @05:06PM
    • 1 reply beneath your current threshold.
  • Microsoft.com in 1996 by dasheiff (Score:2) Wednesday June 19 2002, @05:07PM
  • aside from robots.txt by archen (Score:1) Wednesday June 19 2002, @05:08PM
  • easy to remove and stop from archiving by arson1 (Score:2) Wednesday June 19 2002, @05:08PM
  • You have given permission (Score:4, Insightful)

    by MrResistor (120588) <petehoff AT pacbell DOT net> on Wednesday June 19 2002, @05:08PM (#3732524) Homepage
    By the very act of posting your site on the web you have given permission to make copies of it. Otherwise, how would anyone view it? And if no one is supposed to view it, why have you published it in a publicly accessible space?

    If I went to your website 2 years ago and never closed or refreshed that browser window, would I now be violating your copyright? What if I saved the page so I could view it later offline? What if I never erased that file, would that mean that I'm violating your copyright? I have several floppies of web sites I saved at school for viewing at home from the days when I was stuck on a crappy dial-up service. Does that make me a pirate? What about all the copies of sites held in my browsers cache?

    Don't get me wrong, I understand where the sentiment is coming from, even if I disagree with it. I'm just trying to point out how incongruous it is with the basic nature of computers and the internet and how they work.

    These questions aside, though, I have to come down in favor of the historians. People here are always whining about old movies/books/music being lost because their owners refuse to let them go, even if they aren't using them, why should the web suffer the same fate? The rate of destruction is far faster on the internet, and since it isn't a physical media, the information has to be actively archived if it is to be preserved.

  • A great tool for future historians /archeologists by msoldo (Score:2) Wednesday June 19 2002, @05:09PM
  • get laid by porkface (Score:1) Wednesday June 19 2002, @05:09PM
  • Historical Records by JonBuck (Score:2) Wednesday June 19 2002, @05:09PM
  • public domain by qubit64 (Score:1) Wednesday June 19 2002, @05:10PM
  • libel? (Score:3, Interesting)

    by sckeener (137243) <sterling@texaskeeners.org> on Wednesday June 19 2002, @05:13PM (#3732557)
    I didn't know that the wayback machine went that far back. I wonder if anyone is going to go to jail from posts they made in the past....
  • opting out (Score:3, Interesting)

    by josepha48 (13953) on Wednesday June 19 2002, @05:14PM (#3732571) Journal
    At least for google to opt out of its service add the following tag in the "head" of your web page:
    <meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    This will tell google not to cache your pages. If you dont want them to index your page and include the page in the search engine use:
    <meta NAME="ROBOTS" CONTENT="NOINDEX">
    Now I am not sure about this other site that is caching old pages, and right now I cannot get through but if they are caching any of my pages I will tell them to take them off as ALL my pages are just that MY pages. I think you can sue them, I'd imagine with all the other internet lawsuits it would be valid. They are stealing your pages.
    • Re:opting out by mbauser2 (Score:1) Wednesday June 19 2002, @06:12PM
      • Re:opting out by josepha48 (Score:2) Wednesday June 19 2002, @06:49PM
    • Re:opting out by Buran (Score:2) Wednesday June 19 2002, @06:29PM
  • His-to-ry by Fapestniegd (Score:1) Wednesday June 19 2002, @05:15PM
  • Serious flaw in the internet's design by l33t-gu3lph1t3 (Score:1) Wednesday June 19 2002, @05:16PM
  • Bad use? by sheepab (Score:1) Wednesday June 19 2002, @05:17PM
  • Mine! Mine! Mine! by bryny (Score:1) Wednesday June 19 2002, @05:18PM
  • Damn you slashdot! by Aanallein (Score:2) Wednesday June 19 2002, @05:20PM
  • I had my sites removed by kstumpf (Score:2) Wednesday June 19 2002, @05:20PM
  • Wayback Machine and Privacy!! by jdriller (Score:1) Wednesday June 19 2002, @05:22PM
  • The backup copy of the archive by Animats (Score:2) Wednesday June 19 2002, @05:26PM
  • function like search engines. by Restil (Score:2) Wednesday June 19 2002, @05:28PM
    • 1 reply beneath your current threshold.
  • dating back to 1998 (Score:4, Funny)

    by quantaman (517394) on Wednesday June 19 2002, @05:28PM (#3732632)
    Anyone else find it mildly disturbing that 1998 is considered to be distant history?

  • The other issue by corebreech (Score:2) Wednesday June 19 2002, @05:29PM
  • Copyright *is* archiving. by blair1q (Score:2) Wednesday June 19 2002, @05:30PM
  • Saved my butt more than once.... by jafiwam (Score:1) Wednesday June 19 2002, @05:31PM
  • Nothing you can do by litewoheat (Score:2) Wednesday June 19 2002, @05:31PM
  • Are we now advocating for the RIAA? by fermion (Score:1) Wednesday June 19 2002, @05:32PM
  • Some one hasn't done their research (Score:4, Informative)

    by mfos.org (471768) on Wednesday June 19 2002, @05:32PM (#3732659)
    A few things

    1) They've been archiving since 1998, but they've only recently had the horse power to provide a live connection to it

    2) It is very easy to not have your stuff indexed. the directions are here. [archive.org]
  • court evidence? by dubiousmike (Score:2) Wednesday June 19 2002, @05:32PM
  • Excuse me? by innocent_white_lamb (Score:2) Wednesday June 19 2002, @05:33PM
  • Dead-tree publishing parallel by Todd Knarr (Score:2) Wednesday June 19 2002, @05:44PM
  • ShaunC by Ryu2 (Score:1) Wednesday June 19 2002, @05:45PM
  • What damages? by blair1q (Score:2) Wednesday June 19 2002, @05:45PM
  • A serious question? If so, it's OT by Anonymous Coward (Score:1) Wednesday June 19 2002, @05:47PM
  • Who archived it and why by Robotech_Master (Score:2) Wednesday June 19 2002, @05:59PM
  • This is just angering by Mr. Buckaroo (Score:1) Wednesday June 19 2002, @06:00PM
  • The purpose of copyright... (Score:3, Insightful)

    by kcbrown (7426) <slashdot@sysexperts.com> on Wednesday June 19 2002, @06:06PM (#3732887)
    The Congress shall have Power To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries

    -- United States Constitution

    The purpose of copyright is to promote progress, to entice authors and inventors to release their works and discoveries to the public.

    But that is not an end unto itself. The true end is the benefit to society that the release of such works brings.

    Now, remember that the whole incentive here, the entire reason for granting the monopoly privilege of copyright, is to allow the originators of works to make money from their works, which in turn (theoretically) gives them incentives to release their works to the public.

    When you publish something on the web, you're publishing your works for free, unless you go to the extra trouble of implementing some kind of access control. The Wayback Machine won't work on a site that has access control, so all it ends up archiving is stuff that was published for free public consumption.

    So the real question is: if a work has already been released for free to the general public, how would letting authors restrict the republication of that work after the fact bring greater benefit to society than not letting the author impose such restrictions?

    My opinion is that it is much more beneficial to society as a whole if the release of a work for free public consumption automatically implied that members of the public have the right to redistribute that work. So if an author doesn't want people in the general public to be able to redistribute his work, he has to control who receives the work and who doesn't. Certainly requiring payment for the work in question is sufficient to meet the requirement of controlling access. But whatever method the author chooses, it should be one that makes it clear that the work in question is not being released for free to the public.

  • Hey, it's just a really... by neo (Score:2) Wednesday June 19 2002, @06:07PM
  • Removing yourself from the Archive... by spacefight (Score:1) Wednesday June 19 2002, @06:10PM
  • Great for getting around corporate content filters by lscotte (Score:1) Wednesday June 19 2002, @06:11PM
  • Removal Instructions by akiy (Score:2) Wednesday June 19 2002, @06:12PM
  • archive by happy monday (Score:1) Wednesday June 19 2002, @06:13PM
  • Here's the scoop on copyright... by dsrtegl (Score:1) Wednesday June 19 2002, @06:20PM
  • It's time for a robots INCLUSION standard. by SplatFileGoo (Score:1) Wednesday June 19 2002, @06:21PM
  • It's a FRIEND *and* a FOE by newerbob (Score:1) Wednesday June 19 2002, @06:24PM
  • The WayOff Machine by TheJohn (Score:1) Wednesday June 19 2002, @06:31PM
  • Your kepboard is a microphone... by surfcow (Score:2) Wednesday June 19 2002, @06:33PM
  • What a stupendous waste of DASD. by crovira (Score:1) Wednesday June 19 2002, @06:33PM
  • Wayback machine by Anonymous Coward (Score:1) Wednesday June 19 2002, @06:34PM
  • I dont care much about copyrights..... by O.F. Fascist (Score:1) Wednesday June 19 2002, @06:35PM
  • Coincidentally enough... by Kickstart70 (Score:2) Wednesday June 19 2002, @06:41PM
  • WOW! SEX.COM! by wo1verin3 (Score:2) Wednesday June 19 2002, @06:42PM
  • Here is how to opt out by kennethrona (Score:1) Wednesday June 19 2002, @06:44PM
  • by fdiaz5583 (531839) on Wednesday June 19 2002, @06:47PM (#3733104) Homepage
    If anyone has ever heard of the Library of Alexandria it was supposedly the most impressive knowledge base the world had ever assembled. Some crazy guy came by and burnt it to the ground -- setting the entire industrialized planet back hundreds perhaps thousands of years. We are now in the process of surpassing this great library, and are making it even easier for people to have access to knowledge. That knowledge may be porn, may be the morning news, or sports scores, it may even be how to construct a nuclear bomb. Nevertheless it is knowledge and EVERY person who is alive has the God (and any other higher power) given right to knowledge, despite what any government agency, or copyright may say. 21st century libraries such as the WayBack Machine are providing the tools necessary for researchers to go "back to the future." This is a great service to mankind, and it's overall importance should not be outweighed by greedy, and or overparanoid privacy rights activists. If you do not wish to be known, please do not post any information on the web, and move to the jungles of Africa and step away from a time and place known as the PRESENT.
    • 1 reply beneath your current threshold.
  • What if... by rimsky (Score:1) Wednesday June 19 2002, @06:53PM
  • What would really be useful... by Junior J. Junior III (Score:2) Wednesday June 19 2002, @07:02PM
  • Lawsuits in the making? by SnappingFish (Score:1) Wednesday June 19 2002, @07:03PM
  • Like it or not, it's the law by macwhiz (Score:1) Wednesday June 19 2002, @07:04PM
  • Old encryption can be broken! by Jon Howard (Score:1) Wednesday June 19 2002, @07:13PM
  • Copies of copyrighted material by nixterino (Score:1) Wednesday June 19 2002, @07:34PM
  • My thoughts on this.. by zeno_2 (Score:1) Wednesday June 19 2002, @07:41PM
  • Bandwidth Costs by dutchdabomb (Score:1) Wednesday June 19 2002, @07:48PM
  • The Net as a publication medium by erichill (Score:1) Wednesday June 19 2002, @07:52PM
  • opt-in by EdMcMan (Score:1) Wednesday June 19 2002, @08:36PM
  • Just wait... by Eythian (Score:1) Wednesday June 19 2002, @08:44PM
  • 2 little points by Sabalon (Score:2) Wednesday June 19 2002, @08:49PM
  • Slashdot history... by DrkShadow (Score:1) Wednesday June 19 2002, @08:56PM
  • Of course, archives should be legal. by shimmin (Score:2) Wednesday June 19 2002, @09:05PM
  • OT: Local archiving of Wayback machine results? by HEbGb (Score:2) Wednesday June 19 2002, @09:16PM
  • Wayback Machine == Friend by GuNgA-DiN (Score:1) Wednesday June 19 2002, @09:18PM
  • What do you expect? by humblecoder (Score:1) Wednesday June 19 2002, @09:29PM
  • Mind boggles by cicho (Score:1) Wednesday June 19 2002, @09:33PM
  • i can see where this would be helpful by sab0tage (Score:1) Wednesday June 19 2002, @09:44PM
  • Freaky by filtersweep (Score:2) Wednesday June 19 2002, @09:58PM
  • Why opt out? by objekt404 (Score:1) Wednesday June 19 2002, @10:00PM
  • you idiot by blisspix (Score:1) Wednesday June 19 2002, @10:04PM
  • cache = memory by paul_cairney (Score:1) Wednesday June 19 2002, @11:29PM
  • Billionaire Jimmy James owner of WNYX by Conrad_Bombora (Score:1) Thursday June 20 2002, @12:09AM
  • Publishing versus sharing by intermodal (Score:1) Thursday June 20 2002, @12:21AM
  • Copyright and robots by dsoltesz (Score:2) Thursday June 20 2002, @12:23AM
  • I can't resist.... by madcow_ucsb (Score:1) Thursday June 20 2002, @12:36AM
  • Archives, and their use by XO (Score:1) Thursday June 20 2002, @12:40AM
  • This is getting ridiculous. by eatenn (Score:1) Thursday June 20 2002, @12:51AM
  • History is more important than your copyright. by Jamie Zawinski (Score:2) Thursday June 20 2002, @12:58AM
  • So, Basil, by willpost (Score:1) Thursday June 20 2002, @01:01AM
  • Old Slashdot, c. 1998 by piranha(jpl) (Score:1) Thursday June 20 2002, @01:31AM
  • I like it by scottgfx (Score:1) Thursday June 20 2002, @01:32AM
  • Archive is good for keeping companies honest. by ariocksayssquee (Score:1) Thursday June 20 2002, @01:56AM
  • Doesn't affect me.... (that much) by Arricc (Score:1) Thursday June 20 2002, @02:21AM
  • Websites from 1998.. by Chicane-UK (Score:2) Thursday June 20 2002, @02:28AM
  • What do you have to hide? by Vacilando (Score:1) Thursday June 20 2002, @03:06AM
  • Wrong Cache by Wastl (Score:1) Thursday June 20 2002, @03:26AM
  • lol by mnordstr (Score:2) Thursday June 20 2002, @04:03AM
  • consider yourself lucky! by XLR (Score:1) Thursday June 20 2002, @04:38AM
  • You DID opt in by hummassa (Score:1) Thursday June 20 2002, @05:47AM
  • Double standards by Nephrite (Score:1) Thursday June 20 2002, @05:47AM
  • Copyright vs. employer right by heroine (Score:2) Thursday June 20 2002, @07:19AM
  • The Value Of Archiving by chaoticset (Score:1) Thursday June 20 2002, @07:49AM
  • How they got stuff from 1998 by boyko (Score:1) Thursday June 20 2002, @11:18AM
  • Oh, my God that thing is good... by jcpii (Score:1) Thursday June 20 2002, @12:08PM
  • It was a friend for me, today by Kwantus (Score:1) Thursday June 20 2002, @12:41PM
  • Get real by belg4mit (Score:1) Thursday June 20 2002, @02:43PM
  • Re:Copyright must die! There is no such right by Chiasmus_ (Score:2) Wednesday June 19 2002, @04:57PM
  • Re:robots.txt won't work by MrP- (Score:1) Wednesday June 19 2002, @05:14PM
  • 36 replies beneath your current threshold.
(1) | 2