Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Internet

Caching Content and the Shrinking Web? 25

kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remote content servers off-line. How does the ethic/legality issue change, if any, when we're talking about information that once was available but now has moved or disappeared from the provider's site?"

"I run a small discussion-oriented site patterned after Slashdot; small story blurbs and discussion center around links to external content. From time to time we post our own content, but the vast majority involves links to articles on other sites. This structure obviously relies heavily on the external pages being available for our visitors so they can understand the issue or viewpoint being highlighted.

Just before the new year, I took a look back at story entries that had been posted throughout 2002 and found it interesting to note that a large portion of the linked content was no longer available/had moved/etc. In the short term, this is not an issue; most outside material tends to remain available for the length of an active discussion. The problem I see is visitors coming to the site by way of search engines to stories whose linked content no longer exists. Without the background provided by the referenced story link, the discussion or quick blurb may not make sense or may not fulfill the request that brought the visitor to us.

I know I am not alone in this quandary and that others must have run into this before. While I respect the copyright of the external content providers and do not wish to get into the whole issue of lost advertising revenue for them if I were to cache a local copy, I'm curious what other users are doing to mitigate this problem."

This discussion has been archived. No new comments can be posted.

Caching Content and the Shrinking Web?

Comments Filter:
  • it chooses who stays and who will go... google [google.com]
    • Agreed...I often link to a cached article if I feel that it'll go away in a few months.

      Unfortunately, a lot of sites even go so far as to BAN the IPs of the Way Back Machine [archive.org].

      • Unfortunately? Not at all. If someone wantes to remove their stuff from the net, that's their right under copyright law. Period. Don't like it? Move to another planet, 'cause it's an international thing.
  • Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.

    Caching Content and the Shrinking Web?
    Posted by Cliff on 02:55 AM -- Friday March 14 2003
    from the keeping-the-context-intact dept.

    kill-hup asks: "I know the issue of caching linked pages has been discussed many times here on Slashdot, but the majority of those discussions centered around the 'Slashdot Effect' knocking remo
    • Mirror in case it's slashdotted and removed
      Mirror in case it's slashdotted and removed (Score:1)
      by jsse (254124) on Friday March 14, @12:09AM (#5509874)
      (http://slashdot.org/)

      Here is the content I shamelessly mirrored without the permission from the original author. Now all those meta-karma-whore flamers can jump up to my ass and sue me for plaigarism.

      Caching Content and the Shrinking Web?
      Posted by Cliff on 02:55 AM -- Friday March 14 2003
      from the keeping-the-context-intact dept.


      kill-hup asks: "I
  • Most contents removed is as a result of it being slashdotted and the company who provided web hosting service decided that it's better to remove them and cancelled the associated accounts to avoid exceesive bandwidth bill next month.

    If you can see this, you can realize that we are among one of those bloodly murderers who killed those contents. :)
    • That's not highly likely in our case. I would liken our version of a "slashdotting" to be along the lines of a fly hitting a brick wall ;) Again, the purpose of my question was not to debate mirroring and the "Slashdot effect", but in the case of articles that just cease to be available.

      What I believe is that the content providers either went out of business (as is common these days), were swallowed up by another provider who may not archive old content or just lost some pages as a result of a re-design.

  • Knowledge is power (Score:5, Insightful)

    by quintessent ( 197518 ) <my usr name on toofgiB [tod] moc> on Friday March 14, 2003 @04:19AM (#5509895) Journal
    Ethically, we need to keep the channels of knowledge open. If it was public knowledge at one time, it must remain so. Otherwise, we begin to foster an Orwellian world where any number of Ministries of Truth can hide history and rewrite it as needed. A web page is a record of the world at a given time. Just as libraries keep old journals for reference, we need to be able to reference the web of the past.

    Legally, I fear that litigation like Scientology vs. the Wayback Machine will begin to erode this protection. Having a monopoly on knowledge gives an entity the power to bring the masses into submission. We must let truth prevail.
  • by gnovos ( 447128 ) <gnovos@NoSpAM.chipped.net> on Friday March 14, 2003 @04:50AM (#5509963) Homepage Journal
    If your discussion were around the coffee table about a magazine article, and you were writing down your notes on paper and the paper-clipping them to the article (cut out from the magazine, of course) and storing them away in a binder, would you have any qualms about this at all? At ALL?

    To make the case even more clear-cut, imagine if the magazine you are cutting from was completely free to the readers and got all thier revenue from ads sold.

    Would you even care if you cute the ads out along side of the article? No, you would probably even go out of your way to cut them OUT of teh real world example.

    Why is it different when it is on the internet?
    • I agree with your example; it shouldn't matter. The only potential problem I see is that, on the 'net, we have a much larger table. Granted, my site does not have the readership of Slashdot but would I not be re-distributing the original content? Like photocopying the original magazine article and handing out copies to everyone I know, then them handing it out to everyone they know, etc.

      If I could rely on the original content provider to keep the article available, this would be a non issue. It's somew

      • What about this: keep a cached copy of the original article stored on your server, but only put it up instead of the original once the original is no longer available? It would require a lot more checking/work, but I wouldn't think there should be legal problems with this.
  • by idiotnot ( 302133 ) <sean@757.org> on Friday March 14, 2003 @04:58AM (#5509978) Homepage Journal
    I've looked up my past personal sites, and realize how much they suck. Including the brief period where I was enamoured with IE 4.0 (MS had me on their free CD circuit).

    As far as the commerical sites go, I think, inasmuch as bits and pieces are used as "fair use," and people aren't selling things that belong to someone else, I don't see a problem.

    One of the more interesting things I've seen is what Art Bell and his webmaster did when Bell "retired" from broadcasting (let's see how long this one lasts...hmmph). They put out a CD that had some neat extra features, and authorization methods which allow you to access the website through the webmaster's site. Pretty cool, IMHO>
  • Once it's posted, it's public information. Sites that try to prevent others from caching their pages are living in an unrealistic dreamworld that doesn't include ISP proxies, browser caches, and multiple hops through routers.

    In other words, they're morons. Just cache the data privately and ignore what you think the rest of the world thinks about it.
  • by Twylite ( 234238 ) <twylite AT crypt DOT co DOT za> on Friday March 14, 2003 @07:03AM (#5510289) Homepage

    I think there is a deeper problem being alluded to here, that of loss of intellectual property. Copyright, as if often pointed out, has two sides: the copyright owner gets to exercise control over thir asset, but in the end that asset becomes publish property.

    It has long been law and/or practice in most countries that in order to publish a book (or any copyrightable material) a copy must be lodged with the state archive (in the US, the Library of Congress). In order to make a commercial gain off a work it usually requires publication, which means that most works are available in such libraries.

    But the web changes that. Publication becomes a lot more informal, and there is no requirement or even encouragement to archive. How, in such a scenario, can we protect against publically accessible information disappearing forever? This material has been published and, at some point, the copyright will expire; it should fall into the public domain. But it most likely won't: over time it will be taken away, and never seen again.

    Consider the loss we would face if a valuable repository like Slashdot vanished. Deride it all you like - this is nevertheless a meeting place of (amongst others) some very experienced people with insightful comments, leading to a wealth of information gathered on topics that are discussed. It it not at all uncommon to find a Slashdot discussion when searching for technical information.

    archive.org is a start in the process of archiving to prevent this sort of loss -- but how can we move to tackle the problem in a proactive manner?

  • Or at least that is the oft quoted er, umm... quote.
    Something, in this case a webpage, once made public, is likely to be copied to some sort of personal space that is not under the control of the publisher, no matter how much they protest their copyright.
    Once in this personal space however, there is no obligation to share. And so information pools in the corners of the 'net unable to benefit any but a select/fortunate few for fear of persecution.
    Bottom line, if you don't want it preserved for posterity a
  • by oni ( 41625 )

    I'm no lawyer, but I think it's ok to copy content from a news or opinion site as long as you cite your source. In other words, I *think* you're on solid ground if you copy the entire text of a news article and append the date and the place you copied it from.

    There are a couple of reasons why an author might have a problem with you doing this: Firstly, if you draw customers (and therefore ad revenue) away from their site, they wont like it. So, what I suggest is that, at the time you open a discussion t
    • I think those are all good suggestions. In a way, they mirror (somewhat) some of the ideas I had.

      I think the point about not linking to the mirror until after the original article/content becomes unavailable is key to refute any arguments over lost ad revenue. Essentially, I'm saying that I will direct people to your site as long as there's something there for them to read. When you (as the site owner) cease to make the content available, you really aren't losing any revenue by me linking to a cached co

  • ...for all web developers: Cool URIs don't change [w3.org].

  • Either you accept the missing articles (bad choice) or you cache them.

    The answer seems pretty clear cut to me. Google does caching well, so I'd just copy them. Or you could even just link to the google cache, but that could still change.
  • Hypertexual information, posted publicly once, can and should always be preserved, especially if it relevant to another story, as links are used as jump-points here at SlashDot.

    However, because this is hypertext, another procedure needs to be followed: Content needs to be maintained. Because of the fluid nature of the web, which makes the link possible in the first place, some special actions (i e actions not taken with archival of books, magazines, newspapers, etc) need to be taken.

    Here, assuming I had u
  • I've always wondered how the /. effect is different to spam.

    Both claim to be beneficical to the, shall we say, victim. "Information about special offers is useful", "They get more people looking at their banners".

    Both use up bandwidth and cause charges to the victim.

    Spam is often redundant - "I've seen this bloody spam a hundred times" - and we all know how redundant slashdot can be...

    Both can be defended by saying that if you publish your address (site or email) then people can use it.

    Nobody opts into

"The medium is the massage." -- Crazy Nigel

Working...