Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Media

Media Providers And Short Online Retention? 13

delfstrom asks: "Retention time for online reference material is decreasing. First it was Deja moving archives offline. Now try to find the AP story you saw on Yahoo from earlier this year about a judge's order against a CyberPatrol decryption tool. You can't, because anything older than 30 days is canned from news.yahoo.com. Likewise, certain online newspapers (not to mention any names) are removing content after a mere 7 days, though for $25 per retrieved article you can go back to 1977. This certainly goes against the philosophy of not breaking links. What responsibility do information providers have in maintaining articles that they post? In this era of electronic publishing, academic papers are beginning to contain URLs in the references. To what extent can we keep copies of such information and provide it to others?"
This discussion has been archived. No new comments can be posted.

Media Providers and Short Online Retention?

Comments Filter:
  • I think all media contributions should be free. Anyone who is willing to give news should give away their copyrights.
  • by tzanger ( 1575 ) on Monday September 18, 2000 @05:24PM (#770521) Homepage

    This is why I've been hoarding data since about 1992 or so. Anything that I deem worth keeping I keep a local copy of, whether it be my old Bluemail .qwk archives, newsgroup postings, HTML pages adobe acrobat files from where and whenever, old .mod and .stm/s3ms, you name it. I've got .zip files I'll probably never use again, but I've kept them specifically because I got sick and tired of so-called "permanent" sites taking them off.

    Whenever my hard drive gets full, I do a couple categorization passes (I try to keep them categorized as I go but it's never quite perfect; there's always too many files in my /data/dump directory) and then make an .iso. Two copies are burned, one for my bookshelf and one for work or safe storage.

    As Signal11 once had in his .sig (and ripped from somewhere I'm not sure, but I've seen it in the old taglines of yore): I don't have a solution but I admire your problem.

  • It doesn't make sense. Storage keeps getting cheaper, and they go and break the bookmarks and links which would bring people back without effort.
  • I don't know myself, but I'm curious how this compares with things like old newspapers and such.

    I know newsstands tend not to keep even yesterdays papers, it's up to organisations like libraries to do that.

    Do we have any comparible organisations who specifically archive things like online news?
    How do they deal with copyright issues?

    - Muggins the Mad
  • Howdy!

    I was just thinking!
    This is where Freenet and other p2p and distributed sharing programs will and can help!

    Thnx,
    Fuller

    ps. http://freenet.sourceforge.net
    http://www.mirc.com
    http://www.forteinc.com
    http://www.deja.com
    http://www.google.com (their cached pages are wonderful!)
  • by InitZero ( 14837 ) on Tuesday September 19, 2000 @06:31AM (#770525) Homepage

    Storage keeps getting cheaper,

    There are three issuses here. The first is that storage isn't as cheap as you think. The second is that indexes are hard to maintain. Finally, you forget that old text is a good revenue stream.

    Storage

    You are correct that space is cheap for small amounts of storage. If you go to your local computer store, you can buy a 60-gig drive for less than I paid for my first five-meg drive. I have no contention there.

    However, people who archive data for a living don't buy bare 60-gig IDE drives and string them together. It ain't that simple.

    I work for a newspaper. We have every text we have published since 1985 and every picture since 1996 (don't quote me on that last date). They are both inside IBM RS/6000s. The text archive is under 15 gig. The photo archive clocks in at 230 gig (and growing by nearly 600 meg a day).

    Initially, the data lived in a $100,000 HP optical jukebox. When that got too small, we scrapped it and bought IBM 7133 disk arrays. Bare, before you put the first drive in the box, they cost $36,000. Each nine gig drive is $2,000. (Yes, I know you can get them cheaper. But not hot-swap, not with an IBM warrenty, etc.) When you hit 144 gig (9 gig by 16 drives), you've got to buy another 7133. In order to get good performance, you can't just RAID-5 everything in one big SSA loop. You have got to have multiple paths. Each enhanced SSA card is a few thousand dollars.

    Indexing

    Keeping the raw images isn't that difficult in the grand scheme of things. Indexing and searching for content, however, is less than trivial. Keeping the database well-groomed is hard work. You do want all the stuff these web sites keep online to be searchable, right?

    Storing photographs is especially difficult. For a quick discussion on archiving images, see this post [slashdot.org] from a week or so ago.

    Revenue

    Newspapers sell you a hundred stories with pictures and comics a day for, generally, 50 cents. However, if you want a story that was in last year's newspaper, they can charge you five dollars for that story and you will pay it.

    Why on earth would newspapers give you content for free that they spent money to create and archive? Yeah, yeah, information wants to be free and all that but they are still have to make a profit otherwise there will be no information to be made free.

    Solution?

    The obvious solution is for these media outlets to charge for old stories. That way the links don't break and they have a way to support the archive and indexing costs. Folks here won't like that idea.

    Summary

    It's easy to say that the media should keep everything online all the time. In the real world, however, there's problems with doing just that. The problems are both technical and financial. Information may want to be free but 'wanting' doesn't pay the bills.

    InitZero

  • Initially, the data lived in a $100,000 HP optical jukebox. When that got too small, we scrapped it and bought IBM 7133 disk arrays. Bare, before you put the first drive in the box, they cost $36,000. Each nine gig drive is $2,000. (Yes, I know you can get them cheaper. But not hot-swap, not with an IBM warrenty, etc.) When you hit 144 gig (9 gig by 16 drives), you've got to buy another 7133. In order to get good performance, you can't just RAID-5 everything in one big SSA loop. You have got to have multiple paths. Each enhanced SSA card is a few thousand dollars.

    I diagree. Online secure storage is as cheap as we think. We just installed a 500 Gig RAID for US$20,000 for storing huge (and critical) medical images. Are you saying that if you were to provide all of the text of articles of a single daily newspaper back to the the late 70's, that it would require anything more than 500 Gig?

    Sure, I'd go for charging for old stories. Possibly micropayments. As long as the links stay the same! You work for a newspaper. How can we convince newspapers and other media to do this?

  • Freenet won't help alleviate this problem as much as you might think.

    From the FreeNet FAQ [sourceforge.net]: Documents that are never requested are eventually removed through disuse.

    On the other hand, as the price of storage media drops, we'll probably see somebody (Google?) attempt to cache the entire Internet.
  • ... does anyone else smell shades of 1984 here?

    "Your Honor, here is a copy of a news article from May 2001 proving that the MPAA willfully and illegally spanked a room full of children."

    "But how can we be certain that you did not fabricate or alter that article? Where is the original?"

    "Well, your Honor, as is the custom nowadays, all news is removed from a site just 7 days after it is posted...."

    "I'm sorry, but I cannot allow that in as evidence."

    D'oh!

    -----
  • When I was working on my undergraduate degree I took courses outside of computing (for shame you say!) and one highly encouraged using the 'net to find resources for our papers. The class was a humanities class called "Impact of Technology on Society." Ok, great, I can use 'net sources, but when the source takes down the article a week after I find it my teacher didn't accept it as a source. Later she did say she would accept print outs of the 'net sources - in fact we had to now turn in printouts of every source - this was lovely in that my university charged per page of printouts/copies and my one source was on /. from a user post on an article... care to guess how many pages that was for an article with 200+ posts? (and yes, I did set the threading and the like, but she still wanted the whole! thing) I tried to talk her into accepting the .html file on a floppy disk or via email attachment, but nay.

    Well, a lot of that problem was the teacher, but if the sources would realize that sometimes what they say can be used for scholarly work and keep them around for a bit, life would be nicer. I'm all for getting rid of things that no one has accessed in over a year, but when you operate a news site you should atleast think about putting old articles somewhere without pictures and such - just the text, how much space would a years worth of CNN news stories in plain text take up???
  • We just installed a 500 Gig RAID for US$20,000 for storing huge (and critical) medical images.

    Does that storage have a single point of failure? It is mirrored? Is it SSA? Will it work on an RS/6000? Can it be backed up to ADSM/TSM?

    All of these are critical questions for us. There are many solutions that will hold a lot of data for little cost. Take the 1U Maxtor box [slashdot.org] for example. At under $5,000 for 320 gig, it sounds good. However, it only has one NIC and doesn't support an SSA connection so we can't use it. It doesn't scale well within our application environment.

    InitZero

  • actually, Freenet is able to permanently store items. there are people working on various ways of doing it.

    Also, I read in an AJC technology article that there is a group that is archiving the Internet. (something like 3TB and counting last I saw it almost 6 months ago) sorry don't have link.

    also, check out www.archivists.org and also
    http://www.loc.gov/ead/ead.html
    for Encoded Archival Description format.

    Thnx,
    Fuller
  • there are people working on various ways of doing it

    Not that I doubt you, but could you give some details?

Solutions are obvious if one only has the optical power to observe them over the horizon. -- K.A. Arsdall

Working...