Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

The Wayback Machine, Friend or Foe? 508

ShaunC asks: "As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot, earlier this year. Internet archival sites are right smack in the crosshairs of copyright, but they are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?

"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"

This discussion has been archived. No new comments can be posted.

The Wayback Machine, Friend or Foe?

Comments Filter:
  • by TheMonkeyDepartment ( 413269 ) on Wednesday June 19, 2002 @05:41PM (#3732296)
    When you publish something on the web, it is publicly available via HTTP. End of story. Responsible netizens can observe the requests of "robots.txt" but they don't have to. If you want something more controlled, create a VPN or intranet or some other kind of non-public data server.

    Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.

  • by wompser ( 165008 ) on Wednesday June 19, 2002 @05:42PM (#3732300)
    Went back and looked at the site for the .com I used to work for, very nostalgic. The wayback machine is a good resource for people who create content on someone's site (a.k.a. me), and then lose access to it because the company goes under. Now I'm able to add my old content to my portfolio, now that the company who once owned it is gone.
  • Euro friendly :) (Score:1, Interesting)

    by Anonymous Coward on Wednesday June 19, 2002 @05:56PM (#3732438)
    Well, the wayback machine helped me in confronting some companies for raising their prices when we changed to the euro :)

    Especially dominio's pizza. They raised their prices more that 12%. I printed out the page and got a 15% discount :)
  • by www.sorehands.com ( 142825 ) on Wednesday June 19, 2002 @05:57PM (#3732449) Homepage
    It could be argued that the site is publically available and thus anyone can copy it. There is also the issue of fair use. That is why many people place terms of use and robots.txt files on their sites. It could even be a DMCA violation where an IP (or range) has been blocked, so people from that IP use the google cache to bypass the block.


    I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.


    There is case law where unauthorized access to a website is a copyright violation.


    I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).

  • TV Broadcast analogy (Score:4, Interesting)

    by rknop ( 240417 ) on Wednesday June 19, 2002 @05:58PM (#3732456) Homepage

    Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.

    You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)

    If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.

    Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.

    I don't have a good answer to the questions. Just thought.

    -Rob

  • Historical Records (Score:2, Interesting)

    by JonBuck ( 112195 ) on Wednesday June 19, 2002 @06:09PM (#3732536)
    As a historian and future librarian, one thing has always bothered me about the Internet. Because change is a constant, it's very difficult to keep records. It isn't like newspapers, pamphlets, books, or any other form of written record of the past five thousand years. Unless they're printed out, our writings here leave no physical evidence of their existance. Because I feel that the Internet is as significant as the printing press five centuries ago, the prospect of having no records from its early days is frightening.

    We have books from five centuries ago. Will anything here still exist in a readable form five centuries from now? Unless something is done to preserve it, I feel there will be a massive gap in history.

    And this is why I do not object to web archives. They are a half step to printed and more permanent storage mediums, but preferable to nothing at all.
  • by TheCarp ( 96830 ) <sjc@NospAM.carpanet.net> on Wednesday June 19, 2002 @06:11PM (#3732551) Homepage
    The otherquestion is one of historical record.

    What you say does not BELONG to you. It is not property. Once you write it, it exists. You may own the medium it is on, but once it is out in the world it is uncontrollable and no longer owned. You may hold copyright... but a hundred years from now when you are long since dead and copyright is expiring, then what?

    We have the works of Galileo, we have letters that Thomas Jefferson wrote to people, why? because they were written. Many years later, long after the fact, these were made public and part of historic record because they survived.

    On the net, we have a culture of written information apearing and disapearing. This information is part of our culture, its things that we read and see, when it goes away - for whatever reason - we have lost something.

    I have websites from 96 that exist now only in the way back machine. Yea, som eof the stuff I aid back then I don't agree with now, and would rather not have associated with me but, by that same token, I wouldn't want it to be lost forever. If someone read it and what I wrote had enough impact on them that they want to see it again... then I would not even dream of trying to stop them (even if the impact was one of disgust - an impact is an impact) - even if its just someone wanting to see what the web looked like 5 years ago... I think thats valid... I think thats an important record fo our culture.

    the only thing I can see a case for really is the removal of personal information that shouldn't have been public in the first place. Beyond that though, I think its good... i mean... its not something that is ever going to be mistaken for a live current site - you have to actually go to the way back machine and ask for it.

    All in all this is a good thing and I hope it survives longtime.

    -Steve
  • libel? (Score:3, Interesting)

    by sckeener ( 137243 ) on Wednesday June 19, 2002 @06:13PM (#3732557)
    I didn't know that the wayback machine went that far back. I wonder if anyone is going to go to jail from posts they made in the past....
  • opting out (Score:3, Interesting)

    by josepha48 ( 13953 ) on Wednesday June 19, 2002 @06:14PM (#3732571) Journal
    At least for google to opt out of its service add the following tag in the "head" of your web page:
    <meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    This will tell google not to cache your pages. If you dont want them to index your page and include the page in the search engine use:
    <meta NAME="ROBOTS" CONTENT="NOINDEX">
    Now I am not sure about this other site that is caching old pages, and right now I cannot get through but if they are caching any of my pages I will tell them to take them off as ALL my pages are just that MY pages. I think you can sue them, I'd imagine with all the other internet lawsuits it would be valid. They are stealing your pages.
  • by rhaig ( 24891 ) <rhaig@acm.org> on Wednesday June 19, 2002 @06:19PM (#3732591) Homepage
    dejanews was my best tool to weed out resumes

    before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
  • by I_redwolf ( 51890 ) on Wednesday June 19, 2002 @06:31PM (#3732653) Homepage Journal
    If the creator wants people to see his/her creation, but not give them the right archive and retransmit the works just like always they can put a (C) at the bottom of their webpage expressing that redistribution of the work without express permission of blah is prohibited. Obviously that would bring the question about how many people the creator would want to see his/her work in the first place. If they want to be selective then be selective, write a webapp that only allows registered users who have agreed to a non-disclose or redistribute license etc etc. There are many ways to go about it so long as the creator understands "When you publish something on the web, it is publicly available via HTTP".
  • Wayback machine (Score:1, Interesting)

    by Anonymous Coward on Wednesday June 19, 2002 @07:34PM (#3733050)
    In my opinion, when you post publicly on the web, you are essentially saying "This is public information, it may be copyrighted, but it is public". Then it's a question of whether or not the Wayback Machine is considered "fair use", and I believe it is. If it is, then you can't stop them. End of discussion, right?

    Now, if you don't want this stuff to be publicly accessible on the web, there is now a precedent (set by Google) for SSL sites. There is also the robots.txt convention you mentioned.

    The only real issue I see in the archival sites is "How do they know that domain ownership changed hands?". If a porn site comes along and buys the domain after you're done with it, how does the wayback machine protect you from inconsequential damages that might arise?

    I don't know... But I do know that the web and the internet in general was never intended for privacy or copyright, as such, and maybe we just need a new protocol?

    Dave
  • by Kickstart70 ( 531316 ) on Wednesday June 19, 2002 @07:41PM (#3733084) Homepage
    Yesterday I used the Wayback Machine for one of the lawyers at the law firm I work at to prove that a company at one point had an office in a certain location. The company in question was trying to duck out of a contracted agreement by saying they were not the people who signed the contract.

    The Wayback Machine proved that they indeed knew of, approved, and granted authorization to this specific office, and the other people had a valid contract. In this specific case, the Wayback Machine prevented an apparently scumbag company from trying to screw some apparently good people over.

    Kickstart
  • by fdiaz5583 ( 531839 ) on Wednesday June 19, 2002 @07:47PM (#3733104)
    If anyone has ever heard of the Library of Alexandria it was supposedly the most impressive knowledge base the world had ever assembled. Some crazy guy came by and burnt it to the ground -- setting the entire industrialized planet back hundreds perhaps thousands of years. We are now in the process of surpassing this great library, and are making it even easier for people to have access to knowledge. That knowledge may be porn, may be the morning news, or sports scores, it may even be how to construct a nuclear bomb. Nevertheless it is knowledge and EVERY person who is alive has the God (and any other higher power) given right to knowledge, despite what any government agency, or copyright may say. 21st century libraries such as the WayBack Machine are providing the tools necessary for researchers to go "back to the future." This is a great service to mankind, and it's overall importance should not be outweighed by greedy, and or overparanoid privacy rights activists. If you do not wish to be known, please do not post any information on the web, and move to the jungles of Africa and step away from a time and place known as the PRESENT.
  • Re:Erm (Score:2, Interesting)

    by Qrlx ( 258924 ) on Wednesday June 19, 2002 @08:41PM (#3733340) Homepage Journal
    I agree with the kevin completely. What is wrong with having old copies of your site archived? take this quote from the front page of this article:
    1. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews.

    I don't know what kind of a "purist" this person thinks they are. DejaNews (now google) is one of the *best* places to look for info that's relevant but not this week's headline. We might as well burn all the libraries to the ground, since they contain books with embarassing misprints or factual errors.

    It might not be easy to get your site out of the Wayback machine, but it doesn't sound like it's impossible either. Consider the alternatives; would you rather live in a world where the past can be "updated" as needed, like the (purportedly reputable) New York Times did to the web version of a Sep. 9 story warning about Osama bin Laden. Right after September 11 they replaced it with a puff piece-- full details here. [democrats.com] (Warning, contains links to the NYT registration-reqd pages and I think the content may have been re-scrubbed since this appeared on BuzzFlash.)

    If there's no record of content, how am I supposed to provide a bibliography or references for "something I saw on the web somewhere?"
  • by nexthec ( 31732 ) on Wednesday June 19, 2002 @08:57PM (#3733397)
    Actually, In canada (I am an american, but I'm married to a canuck) anybody can rebroadcast anything. the deal is tho, that they can not change it, cant remove advertisments, cant shorten, lengthen, commentary over it, or put up their logo. kinda a neat idea.
  • by madmancarman ( 100642 ) on Wednesday June 19, 2002 @11:38PM (#3733974)
    dejanews was my best tool to weed out resumes

    before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.

    This kind of freaked me out when I started teaching in 1998 - I'd been running a large fan web site devoted to one of my favorite bands, and being heavily into the band, I posted a lot in their newsgroup and participated in more than one flame war. Of course, I was in college and in my very early 20's and late teens, but it's all archived on DejaNews now, with no way to remove it. I really doubt any public school districts are going to wise up to this (or even care, considering the national teacher shortage), but I wouldn't be surprised if it came back to haunt me in some way some day. As a previous poster mentioned, such is the burden of free speech.

    An interesting thing did happen to me at the beginning of this school year. I teach high school computer classes, and I was talking about managing that fan web site when one of my students (a junior) opened his eyes really big and pointed at me with his jaw dropped, sort of aghast. I paused and asked him what was wrong, and he exclaimed that he downloaded and used the guitar tabs I'd written years earlier when he was in junior high. I found that kind of amusing!

    I think the archiving of the internet is particularly scary when I can still find a lousy guitar tab I did of Pearl Jam's "Footsteps" [guitaretab.com] that I did back in 1992, when I was a senior in high school piggybacking off an account at the nearby university, on my parents' Apple //e, while I was still learning how to play guitar. Obviously, the internet can have a much longer shelf life than a ProDOS 5.25" floppy (excluding news sites [msnbc.com] that "expire" their articles after limited availability).

    First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi

  • Re:Robots.txt (Score:3, Interesting)

    by ShaunC ( 203807 ) on Thursday June 20, 2002 @02:16PM (#3737691)
    Sigh. This, I suppose, is what happens when Slashdot keeps stories in the queue too long:
    2002-03-30 10:12:57 The Wayback Machine, friend or foe? (askslashdot,news) (accepted)
    At the time, I was having severe problems getting in touch with anyone at The Wayback Machine. Yes, their site makes it quite clear how to have your site removed. Yes, I placed the appropriate entry in my robots.txt files. Yes, I submitted my sites for exclusion. Then nothing happened. After emailing them several times with a list of domains I'd prefer to have removed from the archive, I got a reply back saying they should disappear by the end of the following day. No go.

    That's all changed. They've got the kinks worked out, as best I can tell, and have begun obeying robots.txt files. They weren't so diligent about it three months ago, or I wouldn't have gotten ticked at 'em.

    BTW, my submission was edited in at least one place: I don't capitalize the word "SPAM," as the capitalized version is Hormel's trademark. (Maybe my submission was combined with someone else's; hard to remember what I wrote 3 months ago.)

    Everything else I'd say has already been said, I wish I'd noticed the story sooner.

    Shaun

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...