Caching Content and the Shrinking Web? 25
"I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages
being available for our visitors so they can understand the issue or viewpoint
being highlighted.
Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short
term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.
I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content
providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."
the caaaaaache (Score:1)
Re:the caaaaaache (Score:1)
Unfortunately, a lot of sites even go so far as to BAN the IPs of the Way Back Machine [archive.org].
Re:the caaaaaache (Score:2)
Mirror in case it's slashdotted and removed (Score:2, Interesting)
Caching Content and the Shrinking Web?
Posted by Cliff on 02:55 AM -- Friday March 14 2003
from the keeping-the-context-intact dept.
kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remo
Mirror of your mirror, just in case... (Score:2)
Mirror in case it's slashdotted and removed (Score:1)
by jsse (254124) on Friday March 14, @12:09AM (#5509874)
(http://slashdot.org/)
Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.
Caching Content and the Shrinking Web?
Posted by Cliff on 02:55 AM -- Friday March 14 2003
from the keeping-the-context-intact dept.
kill-hup asks: "I
Have you realized that (Score:1)
If you can see this, you can realize that we are among one of those bloodly murderers who killed those contents.
Re:Have you realized that (Score:2)
What I believe is that the content providers either went out of business (as is common these days), were swallowed up by another provider who may not archive old content or just lost some pages as a result of a re-design.
Knowledge is power (Score:5, Insightful)
Legally, I fear that litigation like Scientology vs. the Wayback Machine will begin to erode this protection. Having a monopoly on knowledge gives an entity the power to bring the masses into submission. We must let truth prevail.
Look at it this way... (Score:5, Interesting)
To make the case even more clear-cut, imagine if the magazine you are cutting from was completely free to the readers and got all thier revenue from ads sold.
Would you even care if you cute the ads out along side of the article? No, you would probably even go out of your way to cut them OUT of teh real world example.
Why is it different when it is on the internet?
Re:Look at it this way... (Score:2)
If I could rely on the original content provider to keep the article available, this would be a non issue. It's somew
Re:Look at it this way... (Score:1)
A source of embarassment (Score:4, Interesting)
As far as the commerical sites go, I think, inasmuch as bits and pieces are used as "fair use," and people aren't selling things that belong to someone else, I don't see a problem.
One of the more interesting things I've seen is what Art Bell and his webmaster did when Bell "retired" from broadcasting (let's see how long this one lasts...hmmph). They put out a CD that had some neat extra features, and authorization methods which allow you to access the website through the webmaster's site. Pretty cool, IMHO>
Just cache it. (Score:2)
In other words, they're morons. Just cache the data privately and ignore what you think the rest of the world thinks about it.
Loss of intellectual property (Score:4, Interesting)
I think there is a deeper problem being alluded to here, that of loss of intellectual property. Copyright, as if often pointed out, has two sides: the copyright owner gets to exercise control over thir asset, but in the end that asset becomes publish property.
It has long been law and/or practice in most countries that in order to publish a book (or any copyrightable material) a copy must be lodged with the state archive (in the US, the Library of Congress). In order to make a commercial gain off a work it usually requires publication, which means that most works are available in such libraries.
But the web changes that. Publication becomes a lot more informal, and there is no requirement or even encouragement to archive. How, in such a scenario, can we protect against publically accessible information disappearing forever? This material has been published and, at some point, the copyright will expire; it should fall into the public domain. But it most likely won't: over time it will be taken away, and never seen again.
Consider the loss we would face if a valuable repository like Slashdot vanished. Deride it all you like - this is nevertheless a meeting place of (amongst others) some very experienced people with insightful comments, leading to a wealth of information gathered on topics that are discussed. It it not at all uncommon to find a Slashdot discussion when searching for technical information.
archive.org is a start in the process of archiving to prevent this sort of loss -- but how can we move to tackle the problem in a proactive manner?
Information wants to be free... (Score:1)
Something, in this case a webpage, once made public, is likely to be copied to some sort of personal space that is not under the control of the publisher, no matter how much they protest their copyright.
Once in this personal space however, there is no obligation to share. And so information pools in the corners of the 'net unable to benefit any but a select/fortunate few for fear of persecution.
Bottom line, if you don't want it preserved for posterity a
IMHO (Score:2)
I'm no lawyer, but I think it's ok to copy content from a news or opinion site as long as you cite your source. In other words, I *think* you're on solid ground if you copy the entire text of a news article and append the date and the place you copied it from.
There are a couple of reasons why an author might have a problem with you doing this: Firstly, if you draw customers (and therefore ad revenue) away from their site, they wont like it. So, what I suggest is that, at the time you open a discussion t
Re:IMHO (Score:2)
I think the point about not linking to the mirror until after the original article/content becomes unavailable is key to refute any arguments over lost ad revenue. Essentially, I'm saying that I will direct people to your site as long as there's something there for them to read. When you (as the site owner) cease to make the content available, you really aren't losing any revenue by me linking to a cached co
On a related note (Score:2)
What do you want from slashdot? (Score:1)
The answer seems pretty clear cut to me. Google does caching well, so I'd just copy them. Or you could even just link to the google cache, but that could still change.
A Hypertexual Caching Doctrine for Slashdot (Score:1)
However, because this is hypertext, another procedure needs to be followed: Content needs to be maintained. Because of the fluid nature of the web, which makes the link possible in the first place, some special actions (i e actions not taken with archival of books, magazines, newspapers, etc) need to be taken.
Here, assuming I had u
Spam and the /. effect (Score:2)
Both claim to be beneficical to the, shall we say, victim. "Information about special offers is useful", "They get more people looking at their banners".
Both use up bandwidth and cause charges to the victim.
Spam is often redundant - "I've seen this bloody spam a hundred times" - and we all know how redundant slashdot can be...
Both can be defended by saying that if you publish your address (site or email) then people can use it.
Nobody opts into