ShaunC asks:
"As the webmaster of numerous sites, I'm curious how others feel about the Wayback Machine. What particularly interests me is the fact that the Machine is a relatively new animal, yet it contains snapshots from my sites dating back to 1998. I can't help but wonder: where did they get such old copies of my websites, and who gave them permission to make those copies? I certainly didn't provide either. Perhaps I'm too much of a purist, but I've always seen the internet as an ever-changing medium, not a permanent one. Archives have bothered me ever since the fledgling days of DejaNews." This site last made an appearance on Slashdot,
earlier this year. Internet archival sites are right smack in the
crosshairs
of copyright, but they
are useful. Anyone who has ever used Google's cache (and there are plenty of those links on Slashdot) can attest to this. Of course, the issue that may bug many content providers is how to opt-out of such services, since some see it as a copyright violation. Is it possible to balance the issues of copyright and history, or will these two Internet resources find themselves in legal trouble in the future?
"The way I see it, archives are much like SPAM; I never opted in, why should it be my responsibility to opt out? I manage a number of domains and the process of refining robots.txt files and submitting myself to the Wayback Machine for removal seems to be intrusive. Worse, domains I've abandoned (which have lapsed or been re-registered by someone else) are forever archived in the Machine and I have no way to exclude them. Why should I have to deliberately remove my copyrighted material from an archive which was never granted permission to replicate that material in the first place?"
Opting out -- of publicly available HTTP??? (Score:4, Interesting)
Your argument is similar to that of newspaper publishers who didn't like "deep linking." What they couldn't (or didn't want to) understand is that the nature of an HTTP web server is quite simple. A client asks for a file, the server gives it back. Using that protocol implies that you are OK with that. If you're not, I suggest you look into different technologies, instead of complaining about lack of control, in a medium that was never intended to provide it.
Talk about a time machine... (Score:3, Interesting)
Euro friendly :) (Score:1, Interesting)
Especially dominio's pizza. They raised their prices more that 12%. I printed out the page and got a 15% discount
Copyright and websites. (Score:3, Interesting)
I don't mind that my site is being added to indexes that the public have use of for free. I have a problem where a company uses my site to make a profit, with no public benefit.
There is case law where unauthorized access to a website is a copyright violation.
I am trying to use copyright law against some of the spammers who scrape my site for email addresses. Then, go after the spam software companies for contributory infringement (let the napster rulings serve some good).
TV Broadcast analogy (Score:4, Interesting)
Some have already drawn analogies to TV broadcasts, saying hey, it was broadcast, you get to keep a copy. You can't bitch now if people still have that copy, unless you're Jack Valenti.
You can spin this how you want. Here's one valid way to think about it though: a TV network brodcasts a show. You make a private copy on a VCR tape. Jack Valenti aside, you can watch that copy again as often as you like, and it's no big deal. However, you do emph not have the right to rebroadcast your copy of that show to the public without the permission of the original copyright holder. (I have my B5 tapes. I'm watching them through again now, showing them to my wife. I'm sure nobody is upset about this. But I'd be in deep doo-doo if I managed to broadcast them on a local access station, or uploaded them to a public website.)
If you are inclined to be negative about the Wayback Machine, you could view it this way. While the page existed on the original site, it was broadcast to the public. If somebody made a personal copy, they have it and will always have it, even if the site goes down. However, when the site goes down, individuals do not necessarily have the right to then "rebroadcast" (i.e. post) themselves the content they downloaded and kept. This, however, is what the WayBack machine is doing.
Mind you, except for the issue with www.dramex.org that I noted above (and which I fixed long ago), I like the WayBack machine, and am happy that they archived the content which was implicitly copyrighted to me. I would have opted in if I had wanted to. But, of course, I didn't know about it back in 1996 to opt in.
I don't have a good answer to the questions. Just thought.
-Rob
Historical Records (Score:2, Interesting)
We have books from five centuries ago. Will anything here still exist in a readable form five centuries from now? Unless something is done to preserve it, I feel there will be a massive gap in history.
And this is why I do not object to web archives. They are a half step to printed and more permanent storage mediums, but preferable to nothing at all.
Re:Opting out -- of publicly available HTTP??? (Score:5, Interesting)
What you say does not BELONG to you. It is not property. Once you write it, it exists. You may own the medium it is on, but once it is out in the world it is uncontrollable and no longer owned. You may hold copyright... but a hundred years from now when you are long since dead and copyright is expiring, then what?
We have the works of Galileo, we have letters that Thomas Jefferson wrote to people, why? because they were written. Many years later, long after the fact, these were made public and part of historic record because they survived.
On the net, we have a culture of written information apearing and disapearing. This information is part of our culture, its things that we read and see, when it goes away - for whatever reason - we have lost something.
I have websites from 96 that exist now only in the way back machine. Yea, som eof the stuff I aid back then I don't agree with now, and would rather not have associated with me but, by that same token, I wouldn't want it to be lost forever. If someone read it and what I wrote had enough impact on them that they want to see it again... then I would not even dream of trying to stop them (even if the impact was one of disgust - an impact is an impact) - even if its just someone wanting to see what the web looked like 5 years ago... I think thats valid... I think thats an important record fo our culture.
the only thing I can see a case for really is the removal of personal information that shouldn't have been public in the first place. Beyond that though, I think its good... i mean... its not something that is ever going to be mistaken for a live current site - you have to actually go to the way back machine and ask for it.
All in all this is a good thing and I hope it survives longtime.
-Steve
libel? (Score:3, Interesting)
opting out (Score:3, Interesting)
Re:There are more than copyright concerns... (Score:4, Interesting)
before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
Re:Opting out -- of publicly available HTTP??? (Score:3, Interesting)
Wayback machine (Score:1, Interesting)
Now, if you don't want this stuff to be publicly accessible on the web, there is now a precedent (set by Google) for SSL sites. There is also the robots.txt convention you mentioned.
The only real issue I see in the archival sites is "How do they know that domain ownership changed hands?". If a porn site comes along and buys the domain after you're done with it, how does the wayback machine protect you from inconsequential damages that might arise?
I don't know... But I do know that the web and the internet in general was never intended for privacy or copyright, as such, and maybe we just need a new protocol?
Dave
Coincidentally enough... (Score:2, Interesting)
The Wayback Machine proved that they indeed knew of, approved, and granted authorization to this specific office, and the other people had a valid contract. In this specific case, the Wayback Machine prevented an apparently scumbag company from trying to screw some apparently good people over.
Kickstart
Anybody heard of the Library of Alexandria???? (Score:3, Interesting)
Re:Erm (Score:2, Interesting)
I don't know what kind of a "purist" this person thinks they are. DejaNews (now google) is one of the *best* places to look for info that's relevant but not this week's headline. We might as well burn all the libraries to the ground, since they contain books with embarassing misprints or factual errors.
It might not be easy to get your site out of the Wayback machine, but it doesn't sound like it's impossible either. Consider the alternatives; would you rather live in a world where the past can be "updated" as needed, like the (purportedly reputable) New York Times did to the web version of a Sep. 9 story warning about Osama bin Laden. Right after September 11 they replaced it with a puff piece-- full details here. [democrats.com] (Warning, contains links to the NYT registration-reqd pages and I think the content may have been re-scrubbed since this appeared on BuzzFlash.)
If there's no record of content, how am I supposed to provide a bibliography or references for "something I saw on the web somewhere?"
Re:Opting out -- of publicly available HTTP??? (Score:3, Interesting)
Re:There are more than copyright concerns... (Score:4, Interesting)
before I secheduled even a phone interview, I'd always search dejanews for the person in question. Sometimes I'd come up with a definate hit (first and last name as well as email and mentioning the local area or some work that was on their resume) and I'd be able to see what kind of person I was really dealing with. That's when I started looking at what I'd posted.
This kind of freaked me out when I started teaching in 1998 - I'd been running a large fan web site devoted to one of my favorite bands, and being heavily into the band, I posted a lot in their newsgroup and participated in more than one flame war. Of course, I was in college and in my very early 20's and late teens, but it's all archived on DejaNews now, with no way to remove it. I really doubt any public school districts are going to wise up to this (or even care, considering the national teacher shortage), but I wouldn't be surprised if it came back to haunt me in some way some day. As a previous poster mentioned, such is the burden of free speech.
An interesting thing did happen to me at the beginning of this school year. I teach high school computer classes, and I was talking about managing that fan web site when one of my students (a junior) opened his eyes really big and pointed at me with his jaw dropped, sort of aghast. I paused and asked him what was wrong, and he exclaimed that he downloaded and used the guitar tabs I'd written years earlier when he was in junior high. I found that kind of amusing!
I think the archiving of the internet is particularly scary when I can still find a lousy guitar tab I did of Pearl Jam's "Footsteps" [guitaretab.com] that I did back in 1992, when I was a senior in high school piggybacking off an account at the nearby university, on my parents' Apple //e, while I was still learning how to play guitar. Obviously, the internet can have a much longer shelf life than a ProDOS 5.25" floppy (excluding news sites [msnbc.com] that "expire" their articles after limited availability).
First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi
Re:Robots.txt (Score:3, Interesting)
That's all changed. They've got the kinks worked out, as best I can tell, and have begun obeying robots.txt files. They weren't so diligent about it three months ago, or I wouldn't have gotten ticked at 'em.
BTW, my submission was edited in at least one place: I don't capitalize the word "SPAM," as the capitalized version is Hormel's trademark. (Maybe my submission was combined with someone else's; hard to remember what I wrote 3 months ago.)
Everything else I'd say has already been said, I wish I'd noticed the story sooner.
Shaun