On Data Obsolescence and Media Decay 382
mouthbeef asks: "What's the future of storage media? With CDs and tapes prone to relatively speedy decay, and hard-drives an entropic nightmare of moving parts, how
will we keep our data safe over the long haul? I just got some e-mail from a writer pal who isn't really technologically sophisticated, alarmed because someone told him that his backup CDs would decay and rot in 20 years. He's an sf writer, and he was thinking "big picture:" a coming infopocalypse in which sysadmins devote their every waking moment to re-archiving their old backup data." Is such a scenario likely? Why or why not? (More)
"I wrote back that I didn't think that would happen, because:
- Every time I buy a computer, it's got more storage on-board than all the computers I've owned until then, and I just migrate all the data files I've ever created or saved to the new box, like a hermit-crab changing shells
- With broadband becoming more real and more cheap, it makes sense that in the long run we'll store most (if not all) of our data on remote servers -- encrypted, of course -- that are managed by trained pros with access to mirror drives, climate-controlled vaults, etc. etc.
- Even if this doesn't happen, most of your data files will be in stupid, proprietary formats like Word 3.0 that won't be openable, anyway
How reasonable does this seem to you folks? What do you do with data that you need to preserve for the ages? "
networks (Score:1)
Was not CDR life 100 years? (Score:1)
BTW, I hand old CDs from strings outside the house as decorations and have noticed something. The Sun destroys the green ones and the gold ones, but the blue ones do not fade at all. I a blue one down after a year of hard suns rays (I live in the deserts of AZ), and read back a full and complete ISO image of Red Hat 4.2! The other colored CDs didn't even recognize as a valid disc. The blue ones were generic, blank label Verbatims. Don't forget to think about ability to survive abuse when choosing backup media.
Not really one problem, but two or more (Score:1)
Data within an institution typically has a short half-life, say years to decades (banks, tax info, etc.) The problem here is moving all still useful data into a format that is still readable by the rest of the firm, an in-house job in most cases. The hermit crab analogy is particularly apt, going from tape to (say) CD to solid state.
This emerging problem will demand innovation, and specialists. Specialists to resurrect or maintain the old formats and reading machines, specialists to oversee the transfer, and specialists to find the latest and greatest encoding scheme. The real fulcrum here is the manager (information management, I suspect, will become a new and major field), who must schedule the maintenance, oversee the buying and employment of equipment (and stay in budget and on time!!!), and most importantly, get the biggest bang for the buck by keeping only the most necessary data.
The second problem of the infopocalypse has a slower time constant, decades to centuries. It is largely irrelevant to institutions (all but the very few that will survive that long, and who can predict that?). Works of art, philosophy, and science are the major players at this time scale.
Whereas the first problem (of institutions) is particularly Sysyphesian, pushing the data up one hill only to have it roll back down in another 5-10 years, the second problem is not really a problem. Unless you're so anal that you consider all works of art and science to be worth saving.
Think about it, how many physics students read Newton's Principia Mathematica? I know of none. They get the summary and biography in the textbook. From the library. How many art students need to see the original? None. They get a print. From the library. The enduring themes and ideas of our culture last because they are enduring, not because someone chooses to furiously keep copying them down. Sure, there are more new scientific ideas per day now than in Newton's time, but the distilled product is kept in the text, while the old theories and bad ideas are not.
As fields, science and art and history advance on their own, students getting the necessary detail from their teachers (not, say, Bacon or Descartes or Michelangelo). It would be arrogant indeed to assume that we know ahead of time that this or that is worth saving. If it is, someone will save it.
We need to dispense with the silly notion that Visa's database needs to be saved for 20,000 years, or that string theory needs to be repeatedly transferred from CD to solid state to quantum computers. Only the most boneheaded of archaelogists would hope to save all of our present culture for future generations to laugh at.
Only the most idiotic fool would want them to.
650MB is 677,084 punch cards (Score:1)
A binary-punched card has 80 columns of 12 bits (possible holes) each. 650e6/(12*80) is 677,084 cards. Was it 500/box? I remember carrying more than 4 boxes was a chore, but 1354 boxes?
Also consider what one CD is in terms of plain ascii, equivalent to typing novels on a one-font typewriter. If you type 120wpm and words are average 5 characters and a space, that's 650e6/(120*6) minutes of typing, or approximately 41.2 years of continuous 7/24 typing. I don't think the average person will live long enough to fill up a CD that way. So you can assume all your novels and all your source code and all your tax data will fit on a CD.
Of course, your digital movies are another matter. It's just interesting to see relative scales.
Re:Messages to the Future (Score:1)
The same general principles could probably be applied to transmitting less important messages.
A more permanent solution (Score:1)
Modern CD decay is an URBAN MYTH (Score:1)
Please reality check your perceptions at:
http://www.cd-info.com/CDIC/Industry/news/media
If you are still worried about decay, there is a permanent CD format that will last millions of years. HD-ROM (200 gigabyte) and HD-ROSETTA is immune to technology obsolescence, electromagnetic failures, and withstands the effects of time.
http://www.norsam.com/
Re:The Solution (Score:1)
Say, ever heard of the three-body problem?
Stars move in literally unpredictable ways. After a while, your mapping will produce gibberish. Even more ironic, you'll find that your key will be as large, if not larger than the information you are trying to record.
Claude Shannon will not be denied.
Different take on old data collection... (Score:1)
Historian: "What's with all these pieces of paper? Do you think those holes could be encoding something?" (researcher then trips, falls, and unsettles the whole 6-foot deck)
- a stack of Apple ][ disk with "all Apple ][ software ever written". Unreadable by current hardware. Value:near-zero.
Apple IIs are like cockroaches
- A couple of thousand 400K diskettes containing Mac System 1.0, Microsoft Word 1.0, Adobe Photoshop 1.0 and similar stuff. Unreadable by current hardware. Value:who knows?
Still good for competitive upgrades in some situations
Re:Snake Oil (Score:1)
No longer readable, or just no longer readable by your equipment? One thing that doesn't seem to have been mentioned is that our CD readers are generally designed for speed, rather than making absolutely sure every last pit is read. Given that mass-produced CDs have physical pits in the media, I think it's likely the pits themselves will remain, and be readable to some device which reads them rather more slowly than a 40x (or even a 1x) CD-ROM.
Heirachical Storage (Score:1)
data goes to disk, disk migrates to tape.
Transfer Rates/Storage Life (Score:1)
There was another Slashdot article [slashdot.org] about this a while back.
Re:ummm... online storage banks?!?! (Score:1)
"The lie, Mr. Mulder, is most convincingly hidden between two truths."
Re:The obvious answer is... (Score:1)
They weren't so forward thinking in all respects though. This was the same company that had a Y1988 bug caused by using 1-digit years in their databases.
Data Storage (Score:1)
Anything that's actually important enough to keep forever will survive by any means necessary (barring Murphy's Law taking hold). The rest can peacefully degrade.
Bigger and better media (Score:1)
Mankind has always dreamed of destroying the sun.
Embed it in porn gif's (Score:1)
Case in point: If you go the Museo del Oro (sp?) in Lima, Peru, you can see some of the few Incan gold artifacts that the Spanish didn't melt down into gold bars. There aren't that many religious items there, but, well, if you wonder what they did for fun...
Deep Archiving (Score:1)
It seems to me the best storage material would be stainless steel platers from 5 1/4 to 12 inches in diameter. Information could be recoarded by litleraly blasting pits in the material on both sides. The first disk in the archive set would be marked so it would be ovious that it is the first. I would make each disk a different color and that color would folow the sequence of colors in the s pectrim. That way if the disks got out of order it would be easy natural way to tell what order to put them in. A society advanced enough to get to the archive would know what the spectrim is.
I thought about using gold since in has real good properties for storing lasting information but decited against it for several reas ons. One, gold is soft and can easy deform destoring the information. Stanless steel will last forever where I want to put it. No t years, decades or even millinua, but billions of years. Second, gold is vaulable. Stanless steel itself is practially worthless. Only the information on the disks has any real value other than the fact that disks themselves would tell a small story about how technologicly advanced in materal research our society was.
The first disk I would make side one low density. So low that you could tell there was information on it by rubbing your fingers ac ross it. First few tracks would contain a roseta stone on how to read the rest of the disks. The next few tracks would contain a s ummary of what is in the archive and why it is here. The next tracks would contain all the information on how to build a reader fo r the much higher density disk to follow. Hell, I would include a "simple" mechanical reader in the archive for the first disk and base the higher density readers on the simple one.
Now where would I put this archive? I would put it in the most stable, secure environment I could think of, space. The first archi ve I would put on the moon, in Tyco crater. Second archive I would put in a secure orbit around Jupiter and the third I would put i n deep space beyond the orbit of Pluto. Now here is where I start ripping off 2001. I would put each archive set inside a black, o bisidian monolith of perfect dimesions. You want it so that any intellegent creature that came across it would know from looking at it that this was placed here by another intellegence. So you want it to look as unnatural as possible.
I would have the archive set in side a special case in each monolith that would resonate when a high intenstiy radio beam was focuse d inside the monolith. So that any probing intellegence would know there is something inside it. The only way to open it would be to break it.
Now why put it on the moon and in space? Well I couldn't think of a better place to put them. Certainly couldn't keep them here o n Earth. By placing it on the moon if humans where blasted back to the stone age we would have to climb to a level of technology eq ual to the '60 before we could retrieve the first archive. Since by that time we would have forgoten about the second archive the first one would point to it on the last disk. The second archive would contain all the contents of the first archive plus much more . The third archive would contain the first and second but no new information. See the third archive is not ment for humans, it is ment for others.
In 4 billion or so years the sun is going to expand and consume the first and maybe the second archive if they are not found. The third archive will last forever in deep orbit outside Pluto. It would be for any future aliens that came along in a few billion ye ars. It would be our way of saying "we where here, this is us and what we where." It would point to the second and first archive b ut the first and second would not referance it. We want humans to forget about it sinse it is not for us.
What would you put in such a archive? I would put us. Our cultures, our history, copies of our music, stories, religious beliefs, and genetic make up. I doubt I would put much technology in it because that would be redundent. Any society advanced enough to ret rieve the archives past the first would be far in advance of our society when we placed them there.
Somebody is going to see the words space involved and going to bawk because some imgainary cost they are going to pull out of thier ass. While I don't know how much such a project would cost I would rather spend a couple billion on this than on some mulitbillion dollar pentegon wartoy. Besides, I don't think it would cost a couple billion.
Re:non-perishable CDs? (Score:1)
Of course, you should be careful how you store those (vertical or horizontal), considering glass is really an extremely viscous fluid.
Re:Shelf life of recorded CD-R longer than 20 year (Score:2)
Personally I was ripping a lot of my CD collection over the weekend (just so much more convenient to click on a playlist entry rather than having to find a CD every time I want to play it), and several of the early disks that I haven't touched for years have oxidation tracks curling in from the edges. Luckily not far enough yet to destroy data, but had they been 70-minute CDs rather than 40-minute CDs many of the tracks would probably be unreadable.
Incidentally, this isn't just a problem in the digital domain; finding good prints of old movies is becoming harder and harder, and apparently when the Babylon 5 folks got their negatives back from Warner Bros to re-edit the pilot episode they found that many of the rolls had been soaked in a flood in the Warner film vaults, and others had been eaten by rats! In fact, it's quite possible that current DVDs will be the best version of many older movies that will be available in a century from now.
Re:Reading your punch cards (Score:2)
Not true, you can still buy readers. We bought one a couple of years ago to dump a bunch of Harvard survey cards (uses IBM-style cards with slightly different punch codes) for a customer. Since the automated testing outfits are still using cards that are the same size as the IBM cards, it's an easy mod for the manufacturer to make the optical scanner look for holes, rather than splotches of #2 pencil graphite.
For what it's worth, these cards had been "folded, spindled and mutilated" and we still managed to read them, although some extra effort was required. They had been stored in a sub-tropical climate since the sixties, and several of them were bundled with rubber bands--most of the rubber bands had dissolved and crystalized so that we had to scrape the residue off the outside cards to get them past the key slot on the reader. Overall, I'd say the punched cards survived pretty well, despite having been so poorly treated. They did much better than any oxide-coated media would have.
Magnets, zip drives and floppies (Score:2)
This is already a problem! (Score:2)
I work for STK (The company that owns the tape storage market for big companbies with lots of data) Our customers already have this problem.
Nasa (Which has all that satalite and other automaticly collected data that needs to be stored. Not all of it has been processed yet despite being 20 years old or more) They are in the habbit of migrating to the latest tape technology every couple years. (3? no more then 6) because the latest and greatest allows them to get double the storage in the same space. they do this not only for the space savings, but also to keep that data from getting unreadable.
They are not alone, but I can't remember the specifics. (I'm also not sure I'm allowed to mention more)
STK equpiment has a reputation of reliability. Then again, you pay minimum of $20,000 for a tape drive and it goes up to $150,000. (Or buy the OEMed DLT drives for $6,000)
As a linux user, right not the best you can do is copy to a new medium every couple years. Make sure you do a verified write, and keep a copy offsite. (in case of fire if not protection from over zealious law enforcement) Better yet is a vaulting company, which do in fact exist, but they are immature at this point. (Meaning that you shouldn't trust your data to them without research into them, there are good ones and there are those that will lose your data. Pricing may also be more then you want to spend) I would not trust any one media to be my backup.
Remember that most data isn't worth backing up. (linux source - except for local mods that are not yet in the source, /usr, most jpegs . . .) Think carefully, what is worth saving to backup? Probably "My dog by jessica age 6" (momentos of youe kids), pictures of the family, the project you are working on today. Tax records (for three years in most cases). There is more, but the majority of your 50 gig hard drive isn't worth the bother.
Don't forget what other have said about reliability of the medium. They appear to have more data then me so I don't cover that ground. They had other insiteful things to say too.
Long-term archival media exist! (Score:2)
Colin Smith notes,
Have a look at http://www.norsam.com/rom.html [norsam.com] for digital archiving and http://www.norsam.com/rosetta.html [norsam.com] for analog archival storage. The basic technology is to use particle beams to write very high resolution to silicon wafers ("high-performance rock" :-), which are extremely durable as long as you don't go after them with a sledgehammer or something.
The digital version stores 200 GB on a side of a 5 1/4 inch platter (with 10-disk and 300-disk jukeboxes, making possible a "petabyte machine room"), with very high speed (30 MB/s) write rate and reasonable (3 MB/s) read-rate. The analog version you can think of as "super-microfiche", writing analog page-images to the wafer (at something like the entire Encyclopedia Britannica on one wafer); it is readable by even such lo-tech methods as a good microscope (so it shouldn't suffer from reader-obsolescence).Norsam is partially funded by IBM venture capital, by the way.
Re:Snake Oil (Score:2)
magnetic tape and punched card formats which can no longer be read, because there are no surviving readers
Actually, punched cards are the easiest legacy format to read. A reader is an easily constructed electro mechanical device. If a great many have to be read, optical is the way to go. If speed isn't an issue, precision alignment isn't required either. A sheetfeed scanner and simple software can also read the cards. For that matter, if it's important enough, they can be manually transcribed. The above all applies to paper tape as well. All other storage techniques (magtape etc) require more sophisticated readers. If the punched cards have not been read, the data on them must not be all that important.
On the other hand, what would one use (other than a CDROM drive) to read a CD? Any option I can think of requires far more effort that for magtape or punched tape/cards. The best bet (since it's not practical to transfer everything to punched tape) is to keep updating storage media. When denser or more durable media comes into wide usage, that's the time to make a transfer. If the data is to survive civilization itself, the reader should be documented in an easy to read form (such as diagrams and text on hard plastic plates).
The biggest problem I see is stupid copyright protections. When such measures are employed, the media and data are ACTIVLY HOSTILE to archiving/preservation efforts. For an example, over 20 years ago, the BBC lost several early episodes of Dr. Who. A number of them were recovered because fans had recorded them on professional video tape machines (home VCRs were not available at that time). Had copy prevention mechanisms been in place (like MPAA and others want to have now), those episodes would simply be gone because nobody could have recorded them. Consider, how long is forever for a DIVX silver disk?
A few TV shows may not seem all that important (and with TV, they probably aren't), but consider encyclopedias, textbooks, novels, etc. All of which are moving to electronic form, and all of which will probably be in some stupid proprietary copy protected form. That's a problem even now. Just try reading an old Excel spreadsheet today, and then consider trying it in 50 years (Good luck).
On the other hand, nice simple comma delimited ASCII is pretty easy to read no matter how old it is. Even if the fields are not documented, it's not too hard to guess.
In summary, unprotected open standard formats are the way to go if preservation is important.
Norsam HDROM: The Robust Alternative (Score:2)
They had a test done at Los Alamos National Labs [norsam.com] where they tested the media for corruption after exposure to extreme heat and corrosive conditions.
It's not quite ready for people to have an HDROM burner in their home PCs, but I suspect that when the patents run out in a dozen years, many will take interest in the technology...
Re:About floppies (Score:2)
Then they found the disk drive for them.
Re:Data availailability (Score:2)
Many other TV stations did the same thing. When ABC TV (the UK station by that name) was bought by Thames TV, all the old ABC tapes were left in a pile outside. Anything not picked up by collectors was trashed. That would have included the early episodes of The Avengers (many of which ARE now lost forever).
Some programs were lucky and were missed by the raving hordes of vandals and Huns that inhabited TV at the time. Sapphire & Steel escaped by turning into a door-stop.
ummm... online storage banks?!?! (Score:2)
"The lie, Mr. Mulder, is most convincingly hidden between two truths."
Re:Snake Oil (Score:2)
Evidence? I have several (audio) CDs from the early 80s which are no longer readable. OTOH, I have 9-tracks made at the same time which are still OK (presumably; I don't have access to a 9-track drive currently, but they were fine five years ago).
Personally, I think that microfiche is the way to go. Plastic lasts quite a while, and OCR software is already good enough to read in straight text in a standard typeface. And even if civilization collapses, all you'll need is a decent lens and a mirror to review your pre-cataclysm tax records...
A problem if the storage speed doesn't increase (Score:2)
Missing the question (Score:2)
I think the answer is a self-contained recorder and playback device which is sealed and can accept a wide variety of power source. Call it "BackAnywhere"! A data time capsule.
The premise is simple: encase the actual storage device (likely solid-state and non-magnetic for obvious reasons) into a case, write the data out, and seal it. The catch is in the interface - since 100 years from now we can't be certain ASCII will still be in use we shouldn't necessarily write the data in that format. However, it's been shown by history that languages with a sufficiently large text-base can be deciphered even if they are thousands of years old (or a rosetta stone can be found to translate)... I suggest we put a well-known book into the encoding stream. When you start it up and press one of the buttons, out comes shakespear or something. After the archeologists have figured out what it says, press another button and there's the stored data - whatever it may be. Hell, you could bury an entire library into a 6"6"4" space.
The thing about the power supply is the only problem: electronics require power. How this power is put into such a system in a way that ensures that you won't blow the thing to kingdom come if you plug it in wrong will be the problem. Afterall, after WWIII in 3200 when we're rediscovering the lightbulb somebody might have the "bright" idea of plugging it into a 30kV generator.
Something to think about....
Use Kodak disks. (Score:2)
From Kodak's CD-R Overview. [kodak.com]
The InfoGuard protection system includes a special coating that resists damage due to scratches, dirt, rough handling or other common mishaps. As a result, it's reasonable to expect a life of 100 years or more when discs are stored in average
home or office conditions.
--
Why pay for drugs when you can get Linux for free ?
It's already happening... (Score:2)
Re:acid free paper (Score:2)
>preserved. Then they suddenly get awful as paper mills switched to new methods.
s/9/8/ for most of printed materials.
I have several books I inherited that were printed in the 1800s. The two oldest -- bound periodicals from the 1830s -- handle like they were just printed a few years ago. The books from the 1870s are very brittle, & when I have children, I'll have to hide them away from the rugrats until they're old enough to understand just how fragile the darned things are.
And we're not talking quality literature here: the bound peridocials are examples of popular magazines, full of sentimental stories & poetry. At some point the covers were torn off, & my grandmother rescued them just before they were tossed into a fire pit. The one book from the 1870s are translations of Schiller, was far more carefully produced & has an inscription from my great-uncle to my great-aunt.
In a few hundred years, a lot of stuff from the 19th & 20th centuries will be lost. And it'll puzzle people how it happened.
Geoff
Re:ummm... online storage banks?!?! (Score:2)
I use a small program to signal the parallel port and turn on a computer which NFS mounts the harddisk of my PC to make backups... sorta like the coffee mini howto but different
There's prolly better ways to do it, I just built that stuff for fun once and it doesn't take any effort to keep it running
I still trust my own backups better, I wouldn't want somebody else to be responsible for that.
Interesting older technologies (Score:2)
Here's an example:
Wire recorders and wire recordings. These date back to the 1940s and 1950s. Instead of using a magnetic tape as we know it today, you record on a stainless steel wire.
The disadvantages are:
o Mono. It's a single wire. You can't put multiple tracks on it.
o Frequency response. Not so good, but acceptable for voice recording and radio recording.
The advantages are:
o Little to no hiss! Tape hiss is mostly due to the fact that magnetic tape is covered with small, irregularly shaped magnets. A stainless steel wire is continuous, with no individual magnetic particles. Wire recordings sound surprisingly clean.
o In theory, they can last forever! Tape formulations tend to break down over time. The plastic backing dries out and the oxide flakes. A wire recording is just a spool of stainless steel wire. It doesn't deteriorate. I have recordings from the late 40s that still sound pristine, and may well last forever.
A couple more examples of "obsolete" technologies that are incredibly archival:
o Black and white photography: Daguerreotypes. These were made by plating silver on copper, sensitizing with iodine, developing with mercury fumes, fixing with salt, and toning with gold chloride. The image is basically gold on silver, and the images do not fade.
o Color photography: Technicolor. Technicolor pictures were originally made on a special camera that performed color seperation in the camera, and produced three black and white negatives, each representing one of the primary colors red, green, or blue. These negatives were then used to create "matrices", which are essentially printing plates. Finally, the three matrices were used to print the release films -- using highly stable, acid based cyan, magenta, and yellow dyes.
The Technicolor process was replaced in the 70s by monopack film, which has three color layers in the film base. Monopack film is much cheaper to produce and easier to use, but the dyes used are dictated by the chemistry requirements of the process, and the dyes are not stable. This is why original prints of such films as "The Wizard of Oz" retain their color unfaded, while most films from the late 70s and early 80s have faded to shades of pink and red.
Another example is punched cards. As someone pointed out, they can rot, but in a hundred years, if you found a stack of punched cards in the bottom of a desk drawer, next to a magnetic tape, I'll lay odds that you can recover the data off the cards, but not the tape.
We need a long term storage standard (Score:2)
Hardware and software 'readers' would then be certified as LTDS 1.0 compatible, meaning that it can read all the physical media in the standard and all the file formats in the standard.
As time progresses LTDS 2.0 will of course be developed say on DVD-RAM with newer file formats, but LTDS 1.0 would be a subset of the 2.0 standard. Hardware and Software readers would have to be LTDS 1.0 compatible as well as LTDS 2.0 compatible to be certified LTDS 2.0 compatible. You would always be able to read your stuff, no matter how old the format you saved it in.
There is still the problem of physical media decay, but I am sure that the media manufacturers can address this and make some especially long-lived CD-R packaging (or DVD-RAM in the future, or what have you).
-josh
just copy (Score:2)
It's also important not to store content in formats that become undecodable. So, Word 97 is out for archival storage. If the content is in ASCII or UNICODE format, you can probably hack up a parser that gets most of the information back. It is also useful to store source code in a common and reasonably simple language (C, Fortran, core Java, Scheme; not C++) that can decode the content along with the content. For example, for encrypted data, I usually store a source copy of the crypto program along with it. I consider good formats for long term storage formats like HTML, PBM, JPEG (with decoder), MPEG, and Sun audio format.
Re:Most of the data becomes useless (Score:2)
Tubes are better than transistors in certain applications just becasue they DONT work right. They color the sound in a way that is appealing to audiophiles. It has nothing to do with clairty it has to do with a listening experience. Tubes apply sort of a dynamic eq to playback as their electrical properties muddle with the sound. Alot of it is foolishness on the part of gearheads, but a fair amount of it is actual fact, that tubes are percieved to sound better. Remember this is all perception, it's not black and white.
About digital versus analog, find someone with a good quality audio card and record something at 22khz, 44khz, and 48khz. Now do the same accross 16 20 and 24 bits. If you're using good listening and recording equipment, you WILL hear the difference. That doesn't mean it wont sound better than certain analog gear, but it does mean that in theory analog has the ability to do better. Until I sit in a recording studio and hear state of the art digital vs state of the art analog of the same event recorded at the same time, I'll have to side with analog.
The other thing at work here is that people take tapes and make them digital and then whine about digital's quality. That doesnt make much sense as the media is PART of the recording. The limitations and strengths of analog are part of an analog recording, running that through a/d converters to change the forrmat is going to be lossy. Similarly taking a digital recoridng and transcribing it to analog is probably lossy as well.
As for mp3, yeah the quality is bad. Any self-respecting audiophile would never archive to mp3 given analog as an option. Of course most people dont have recording studio quality analog gear and the leap of clarity of a digital reproduction of the master tapes is tremendous when compared with consumer analog devices.
-Rich
Re:non-perishable CDs? (Score:2)
Bit difficult to retrieve, though.
Re:Snake Oil (Score:2)
It is unlikely that the library on Alexandria contained scientific knowledge we haven't rediscovered (I don't believe in lost Atlantis, etc) but it certainly contained facts which are now forever lost, like more historical records of the time than exist in biases recordings (the bible, etc). To have lost that library, and others similar, is tragic, from a historian's POV.
So, we should have a way to record all the data that we want, such that none is ever lost accidentally.
For this, data havens aren't great. If the owner of the data is lost it's all too easy for the data to be meaningless to everyone, strongly encrypted until it appears to be white noise. Physical data is handy this way, if your backups and in a safety deposit box, you might decide on less encryption, enabling heirs to read your files if you didn't pass on encryption keys in your will.
Speaking of which, we need a strong encryption system whereby you can unlock data with a certain number of secondary keys, or a master key, but where the data doesn't get easier to unlock with less than the required number of secondary keys. For instance, the boss can unlock the data, as can any five of the seven employees, but if four conspire, the cracking is no easier. This will let keys be passed on after death, etc, in wills and by delay mail, such that records can be unlocked, but in such a way that a dishonest person can't look at your will and gain premature access to sensative data.
On the subject of easily recovered digital information with a fairly high density, have you considered printing digital data to paper as a series of light/dark areas? This way it can easily be scanned into a bitmap (something we'll always have the ability to do) and a programmer could whip up a translator in an hour or two. Then print an intro page describing the text format (65 -> 'A'), etc, and the encoding (if you need to use anything special) as well as the dimensions, etc. These pages could be printed on high quality paper and laminated, or in the msot paranoid case, photographically etched onto non-reactive metal film (which allows a better resolution, btw.)
Testing of this method allowed 2048x2688 (or so) resolution, which translates into 672KB / page, or just over two pages/floppy disk.
It's the longest term data storage we could think of because if you did the paranoid route and used metal film, it would theoretically last thousands of years, and all it requires to access is a scanner, which we have to assume there will be in the future, and a semi-talented programmer.
It does have file-format problems, but if you completely document the file format in text in the beginning (could even be very small, requiring a magnifying glass or scan+enlarge to read) or at least provide bootstrap info, as in, describe how to read a text file, then include the first data as a text file describing what to do with the rest of the digital data, etc.
With metal film, or even very good paper and photographic printing, you could get 4-8MB / page. At 500+ pages per volume, it's pretty compact storage, and it would be used for rosetta stone type info, or the most important records. Everything else can just be translated from one format to the next every 10-20 years, as storage should allow ten times as much data on the same size medium in that time.
I already thought in this (Score:2)
Some of the data is already encripted like DVDs (ok bad encription, but it is encription), and soon the USA will make it even easier to everyone encript data by relaxing even more the cripto police.
Also the media is evolving, but usualy it is to accomodate more data. The durability of the data is usualy less important, if the media can survive 5 years then it is more then enouth (after all in 5 years this media will be obsolete).
So what is the memory that our civilization will leave for the future archeology? Tiny little disks, with 1000s of terabytes of encripted and compressed data that will be probably half damaged. Even if they could read the data, remember that is probably in almost atomic level, it would require to find the algoritm of decompression and decription and a key.
--
"take the red pill and you stay in wonderland and I'll show you how deep the rabitt hole goes"
Re:Snake Oil (Score:2)
The point is is that stone tablets are damn durable, while digital mediums (take your pick) aren't. When you see a stone tablet you see the inscription and you can say, "Golly gee! There's something written on this! It looks like a horse." or what not. When you see a CD, you look at it and say, "Hmm kind of shinny. Mirror?" or if you're smart/lucky "CD!" Then of course you have to figure out whether it's filled, or empty, whetherit's an audio CD, or data CD. Okay now there's files on it. Is it using rock ridge, joliet, or iso. Is this file data or is this an excutable, or some support file like a libary. Is it for mac, windows, solaris, linux, Be...
We have this problem today. I can give you an 8 inch floppy disk and say, "Behold! The answer to all the world's problem lies within. All you must do is read it and begin." Do you know where to even get an 8 inch drive? I sure don't. The only one I've ever seen was in "Wargames".
You talk about the fact that it's more important for source code to survive. That way you can reconstruct the system that produced the data, and then you can read it. Sounds reasonable enough. One problem. What is source code typically stored on? Big stuff sure as hell isn't stored on paper (However, I did one time see PGP source code printed and bound in an appendix to a book, don't remember which one though. (It had something to do with PGP. Suprise. Suprise.)). Source code is typically stored on a digital medium, because it makes it easier to use. It's a catch-22.
Now don't tell me about "well everyone will know" because "everyone" knew back in the past how to read Myan, and we all know how well that turned out.
Re:CD lifespan (Score:2)
A good point. But honestly I don't think this will be a problem in 20 years. More recent standards like DVD look like they're going to maintain upward compatibility with CD-ROMs. A hundred years might be more of a problem, but hopefully, within a hundred years, there will be ample time to transfer the data to a newer format.
The factor which counts (Score:2)
Is the product capacity times lifespan divided by price. For a CD, you have something like 5*10^10 o.yr/$ (that's a byte-year-per-dollar). For printed paper, it's maybe 3*10^5 o.yr/$, so we've definitely made some progress. Floppies are utterly worthless, by the way: nowadays, they survive about one week before going bad. Anyone care to calculate how much a tape is worth, by this standard (I don't know how much they cost)?
Sheepskin (Score:2)
So go for holes punched in sheepskin: the storage medium of the last millennium.
The problem (Score:2)
The problems instead are actually migrating the data. Ideally, the data should be kept in a live state, transferred from old storage media and converted to more modern formats (and classified and indexed!) during the available migration period, when such migration is supported. That, for even a mediumsized organization will be a full time job for a few people.
In the worse case, you're only transferring from old media. Then, recovering any data instead becomes a full time job of locating it and researching storage formats, finding something able to read those formats and eventually converting the documents to something readable.
Of course, it mostly becomes a problem if your organization is using proprietary format on data. Using the simplest most standard format such as ascii or sgml formatted documents makes it far easier.
Re:What value archived data ? (Score:2)
One problem is that the next generation may not care about a particular piece of data, but the one after that would find that data invaluable.
My parents have recently gotten into geneology and I find it very interesting as well. The thing is is that we continually find people who didn't write down their parents or grandparents names, because they "knew" that information. But two or three generations on and you don't know your great grandparents were, where they lived, or anything else about them and their lives.
People who keep regular journals, even of the most mundane and day-to-day activities and events are invaluable to those who are trying to find out about their lives. Trivial information provides glue that ties historical events into perspective and show how things relate to the individual and not just what is represented by the history books.
2 Solutions (Score:2)
1) "The Brain" by Natrificial. You can check it out at thebrain.com [thebrain.com]
A relational File-manager for Windows.
2) BeOS [be.com]
Clearly the solution is here. The question is: will enough people adopt it to make it work?
Acid free paper and OCR software (Score:2)
Print the data on acid-free paper. Use an impact printer, not a laser printer. (Are you sure that the toner binding agents will last hundreds of years?)
You could print textual material in an OCR-friendly manner (e.g., the source listings in the "Cracking DES" book). This will obviously take a lot of paper and space, but it could be read by a human.
Or you could print the material in a "barcode" type style with plenty of embedded checksums. If you use 1 mm square cells (which should be large enough to allow scanners to adjust for paper warp, water damage, etc.), the amount of data which fits onto a single sheet of paper isn't much more than you get with raw text... until you consider that this is true 8-bit data with error recovery, not 7-bit text.
I think it goes without saying that this format is not intended for frequent use. But if you had information that you *had* to archive for centuries and you had unlimited access to vast underground storage vaults, this is probably the most stable media known today.
A closely related problem (Score:2)
This question has come up twice in the past decade. In the first case, a tape backup drive quietly failed and there was no indication of a problem until they attempted to retrieve a file.
In the second case, the person responsible for performing backups carefully ticked off the paperwork... but as far as anyone could tell he never actually swapped any tapes. The company discovered this after he (intentionally) corrupted the Netware database and then walked out the door.
Both problems can be solved by simple procedural changes (e.g., always "verify" tapes after writing, have someone else run "verify" or rotate the duty).
Yet... twice in the past decade I have direct knowledge of data loss. In one of case this happened despite a competent and dedicated IT staff. Assuming this wasn't just a statistical fluke, it follows that there must be a significant risk that archival data is bad at the moment it's produced - perhaps somewhere between 5-20% of one or more bad media per year per backup group.
It doesn't make much sense to invest in premium media if you're saving garbage.
Re:Snake Oil (Score:2)
Just a thought.
Transmit it into space :) (Score:2)
Re:Snake Oil (Score:2)
But, I downloaded a ton of games that I never had but always wanted - but was not able to find manuals for them all online. Now I am a bit confused as to some of the games...what the blinking blob means or why x happens when my little guy picks up item y. So I have sadly given up on some games that looked cool, but I'll be durned if I know how to play. Given too that a lot of the early games lacked "begin" or "end" screens.
I've also read about Finnish scientists who are trying to come up with signs to last at least 500 years in a language/medium that people will understand in the future, or perceive as a danger sign. Our yellow anad black shield will probably have as much meaning to people in the future as Venus figurines do to people now.
Store it all in a single iron bar! (Score:2)
Iron Bar Storage System
First, take all your data and stream it out as one long multi-digit number. Now place a decimal point in front of the number and treat it as a very precise fraction. With the length of your iron bar treated as 1, measure off the distance along the bar equal to your data's fractional value and file a little notch in the bar.
That's it. The notch now represents the fraction that is your data. Anytime you want to recover your data, just measure the distance to the notch, divide it by the length of the bar, remove the decimal place, and convert the number back into your data bytes.
Voila! Instant low tech storage!
Information Overload (Score:2)
We are getting to a point at which traditional "file systems" are going to become archaic. When "file systems" were first created, drives had very low volume, and very few files. The name-space-to-"file" ratio was very high. It was possible for everything you could fit on a disk to have a uniquely identifiable name/location which gave you instant insight into what that file contained or was for. However, we now have
A new paradigm needs to be introduced. I think traditional file systems will need to acquire characteristics of relational databases. What good is a 17 GB drive if it takes you half an hour to find something you want?? Today we have much richer and diverse content in our data, and our storage systems need to accomodate that. We need to be able to make intelligent, high-level queries, like "All email files which contain spreadsheets on last weeks product demonstration". This is what we are looking for, not "prddemosprsheet012500.text". File's aren't just of one type, or one attribute anymore.
Our data contains many planes of meta-data. We need a storage system that understands that, and allows us to make intelligent and intuitive high-level use of it. We need an associative/relational storage mechanism, whereby files are stored not according to an absolute location, but according to their attributes and relations to other things.
Jazilla.org - the Java Mozilla [sourceforge.net]
Re:Information Overload (Score:2)
The answer is all of the above - through a relational database-type system, associate the file with
Jazilla.org - the Java Mozilla [sourceforge.net]
Re:non-perishable CDs? (Score:2)
well, partially (the process is close) but the idea is to get a single, playable CD with as durable a construction as possible. metal and glass rather than foil and plastic, and so forth
--
non-perishable CDs? (Score:2)
Make a standard backup of your vital data.
Take it to a special "data preservation agent" who will probably do it as a sideline for normal Disaster Recovery stuff
Agent makes an optical mask of a CDR image onto a blank, metal disk
optical mask is acid-etched to give a metallic CD (using two metals with noticably different optical properties, or burning right though if the disk is thin enough)
in a suitable atmosphere, mold glass around the metal disk to give you a metal-and-glass CD.
place in padded, light-opaque, metal case, and state you only guarantee the data readable if it is kept in that case full-time.
Obviously, the DRA would need to keep some hardware capable of reading these, but as he will probably be offering vaulting services for these disks anyhow, he will be wanting to access them on demand in any case. Any comments?
--
Re:Most of the data becomes useless (Score:2)
But it could happen at any time and without warning.
In my experience the typical non-geek computer user buys a computer and uses it hard for eight or ten years. I know people who still use AppleWorks on an Apple
All someone at Microsoft has to do is say in a meeting "You know, I wonder if it's time we dropped support for ancient version of Word X.YZ?" and it could easily happen.
Re:Formats really aren't that big a problem (Score:2)
The question of how long you want your data to live is important. Who do you want to be able to read it--yourself ten years from now, your grandchildren, or an archeologist? Compare these three time scales to the rates of evolution for various data storage technologies and protocols.
Choose ASCII. Absolutely anything can read it. It has been around for nearly forty years and serves as the foundation for Unicode and every other significant modern encoding scheme. If someone can recover the bits you wrote, they can read what you wrote with nothing more complex than cat. If you want to go beyond ASCII, choose HTML. If you want to go beyond HTML, choose TeX. HTML is wonderful for formatted or semi-formatted documents because it is utterly platform independent and almost intuitively obvious to the reader. If formatting is critical to you, TeX is slightly less readable, but clean enough and well-documented enough that your document can be recovered with only slightly more effort than the HTML.
As for storage media, gosh, I can't really help you there. I've seen various reports in here of what does and does not survive for how long. Congratulations if your CD-ROMs last 300 years; doubtless you'll be able to fire up your creaky old computer and read those files back in ten or twenty years from now. But in three hundred years, or even fifty, who's going to have (a) a CD-ROM drive that will read your 300-year-old disk, (b) a computer that will interface to the CD-ROM, or (c) enough documentation of how those technologies work to reproduce a working example?
If a writer wants her stuff to last, her best bet is to print it as text on acid-free paper. The disadvantage to paper is in its editability. With slow-decaying acid-free paper and reliable storage/handling protocols, her worst-case scenario is she has to scan it back in and hack it up. Scanners are wonderfully architecture-independent--they translate what is universal into the currently fashionable file formats of the day. If she wants better editability and format preservation, let her print out the HTML or TeX source; then she can scan it back in and continue messing with its layout, fonts, styles, whatever. So what's the best font to use if you want to scan text back in later and you want humans to be able to read it.
--
Archival strategies will have to change (Score:2)
The problem is to organize data in a way that highlights how long it is needed. It is difficult to give a date in advance after which some things will be obsolete. If I write a book, when I am finished writing it, if no one buys it, the file is useful as long as I want it. If it becomes a best-seller, my biographer will probably want the rough drafts in 20 years. But I don't know that when I save them.
The solution to this is to learn a better strategy for identifying data. Some file formats already make provisions for this. LaTeX and DocBook already provide tags for quite a bit of identifying information about the source. Meta information can be placed into HTML. CVS stores records of who made changes and why in addition to retaining a record of each revision.
In fact, now that I think about it, CVS provides a good model for data storage. You get a way to retrieve each version of a file. You get a way to link together corresponding revisions of several files. And you have a record of when, why and by whom all of the changes were made. But at its heart, it is a system for data that is still alive. It is not a system for organizing the historical records of a person, company or government. And it doesn't address the question of media decay because it is independent of the specific media.
The solution is obvious: store data on cockroaches (Score:2)
http://www.nytimes.com/library/magazine/millenn
That data will survive everything.
acid free paper now in common use (Score:2)
Some long-term solutions (Score:2)
Several groups are looking into this technology as a possible way to stably maintain their archives over a very long period of time. Take a look at the Long Now Foundation [longnow.org] library for an example.
Re:Long term storage... (Score:2)
Magneto-optical is probably one of the most stable storage mediums available.
Theory and practice diverge in an unhelpful manner. 3 years ago I worked on a project to convert a 5 year old MO system to another MO system, simply because the old drives were no longer available and ongoing maintenance was a hassle. Owing to stupid cost-cutting on my project, the "new" drives we used were already becoming obsolete. Today no-one still makes drives that can read either set of disks and on-going maintenance of the #2 system is dubious.
It's worse than it sounds! (Score:2)
Currently the system is hopelessly obsolete and the remaining units are being carefully nursed as we begin the migration effort. Furthermore much of the older tapes have becomre "read once" media, so you can't afford to miss anything.
Many of the suggestions about formats and media life ignore some of the realities and complexities of the real world. Our acrhive necessities break a large number of these assumptions (as I'm sure do many others).
1) ASCII and higher order reprsentations are not adequate for scientific data.
2) Selecting media with a longer life span only defers the problem to a later date and makes the migration process longer. It also makes it even less likely that adequate readers will be available when the migration begins
3) Raw data formats can change arbitrarily often during the lifetime of the archive.
4) There is unlikely to be adequately stable online storage media available to hold the entire archive as well as the "live" data set (data volumes will increase to match existing storage capacity).
So, what can be done? Many of the suggestions already posted are good and should be incorporated into any archive strategy. So here are some suggestions based on things we're looking at:
1) Identify what needs to be archived. As many have noted, most things don't need to be archived.
2) Build a migration strategy into the plan right away.
3) Keep source code and any auxilliary data needed to access the data available with the data itself.
4) Keep at least two copies of the archive. It's amazing how many archives exist in only one place (depressing, really).
Of course the biggest challenge is to make all that data in a meaningful form. That's really the biggest part of the problem, and it's likely to get worse as data volumes grow. Things are coming down the road that will make our current demands look pretty small. That's good and bad. On the good side, our existing problem will fit easily into any solution we come up with at that point. On the down side, it's not clear what those solutions will be.
Preserving for the ages (Score:2)
The first is laserdiscs. They were advertised as permanent. After all, what could go wrong? The media was sealed in plastic and couldn't get any air, so deterioration was impossible, right? Wrong! "Laser rot" became obvious within a few years of the introduction of the technology. There are now gazillions of first-generation (and later) discs that are simply unreadable. I have dozens of them and can personally attest to the sinking feeling that comes with seeing data degrade and become unavailable. (In a similar vein, music CDs will degrade while the vinyl they "replaced" will, given proper care, soldier on for another century or two. And records sound better/hold more data, too. CDs were supposed to be an improvement?!?) The lesson? Don't trust industry shills who tell you a technology is good for 100 years. They simply don't know.
The second example is more personal. As a former photographer, I have some works that I want to preserve forever. Maybe I'm conceited enough to think that in a thousand years my works will be found and I'll be proclaimed a great artist. Maybe I'm just anal. Either way, I want my photos to be around for a long, long time. Now, properly processed silver halide-based film is pretty stable. My negatives will last a long time. But for the ultimate in longevity, I've begun making platinum-based prints on a variety of media, including plastic squares and enamel tiles. If I can find a source of enameled titanium squares (about 10 inches or so), I'll have a combination of media and chemicals that can reasonably be expected to last for a couple of thousand years. The lesson? True long-term data integrity sometimes requires an open-minded approach. If I rejected platinum processes because they were 100 years old, I'd have never discovered their permanence.
Until someone can come up with a novel way to store data that is truly permanent, I'll rely on the "bigger hard drives, cheaper, every year" theory to keep my data safe. But I don't really feel good about it.
Re:Most of the data becomes useless (Score:2)
This may be true of households and many businesses, however there are government requirements for keeping data for things like clinical trials and funded research which are subject to this problem.
There is also a public-interest issue here. Imagine if the tobacco industry, which has in effect lied to the public and hence murdered for the past three or four decades, had been required to have its research data archived in a retrievable format. Another example is the archived PROFS email correspondence of the National Security Council members during the Reagan era, which led to the smoking guns of the Iran-Contra scandal. A final example: a small city near where I live recently had difficulties deciding in what its City Charter actually said, because of poor recordkeeping of its officially adopted amendments over the years.
While most data is of little use to us after a year or a few years, there are longer-term projects and public-interest requirements that make it a public issue. True, I doubt if anyone would want to save most of those Letterman Top-10 lists, blonde-jokes and similar net-chaff for very long. I do think the Stephen Wright stuff will endure, however....
-Dave
I don't think this will happen. (Score:2)
Perhaps the amount of data will increase faster than the amount (price) of storage, but I doubt it. 640k should be enough for anyone! In any case, all the data generated thus far is likely to remain safely stored somewhere until extinction, if it is ever digitized and made publically available (and anybody cares to store it).
I can imagine them in 3000 CE looking back on the logs of the (at that future time) most popular web site ever to have existed, and reading this very thread
Preserving the history of AI (Score:2)
Two years ago, I was involved in an effort to preserve the archives of the Stanford AI lab from the 1960s and 1970s. Several alumni spent weeks taking turns reading 9-track backup tapes through the last two working 9-track tape drives at Stanford. The raw data was sent over the Internet to a big file server at IBM Almaden. There, somebody who remembered how the old SAIL tapes were formatted had written a program that extracted the files from the backup tapes.
Once the files had been extracted, they were processed into standard formats. (The SAIL machine had its own wierd character set and its own image formats.) Text was converted to Unicode and (monochrome) images were converted to GIF. The MD5 hash of each file was computed, and duplicate files (these were backup tapes) were removed based on the MD5 hash.
The material was then indexed with a web-spider type program, so it could be searched readily. CD-ROMs were made of the content belonging to individuals, and sent to those who could be identified. Permission is being obtained from each individual to have their data published. (The files include private E-mail, for example.) Data approved for public release will be visible on the Web in a year or so.
If you ever had a SAIL account at Stanford, Bruce Baumgard at IBM Almaden has your stuff, and you can contact him for a copy.
It's a lot of work. And this was only about 10GB of data. This gives you a sense of how hard the problem is.
Why not space? (Score:2)
Not everything will be hard to read (Score:2)
Newspapers are frequently archived using photographic reduction. These are expected to last a very long time. Reading these just requires a very good magnifying glass.
Even CD's would probably be quite easy to decode and decypher with technology equivalent to ours.
Also, we keep a lot of data around purely for future historians. I think destroying everything, or even just that part of all the worlds data stored to prevent decay would be virtually impossible.
What might be harder to decypher than the technology is the language that these are written in. We need to start making some Rosetta Stones, and burying them all over the world.
What value archived data ? (Score:2)
The answer is simple. If it is of relevance, and people want to keep it (remember Oliver North ?), then people will keep it. People will judge the value, and take appropriate action. Otherwise, it goes to the bit bucket in the sky, and remains there
In years to come, will we be complaining of 'data pollution', caused by spurious archives that no-one dare ditch, or noxious chemicals from discarded CDs ?
Data Decay vs Data Expansion (Score:2)
Standards (Score:2)
One thing that has always disturbed me is the chaotic way in which information technology evolves. It is natural that it is so, considering evolution comes through individual steps taken here and there. But I believe that if we opted for a 1-year freeze in new developments and set up new standards, everyone should benefit in the long run.
It sounds absurd, ok, but I would like to see a standard defining a general information storage format, which would encapsulate whatever format would be chosen to really organize data (FAT, NFS, journaling, orthogonal, etc.), much in the way that IP encapsulates other specific protocols. If well designed, such a standard could allow for really substantial growth in storage capacity and still provide backwards compatibility, like reading a 720kB floppy disk in a 1,44MB drive. It could be designed to work in disks, tapes, chips and so on.
But that, of course, is just wishful thinking...
Long term storage... (Score:2)
storage mediums available.
It can be rewritten up to 10 million cycles, has a
shelf life of 50+ years, and are only affected by
magnetic fields if you heat the surface to 300
degrees.
Another good contender with a higher data density
is AME tape technologies, such as AIT, Mammoth and
VXA-1. AME tapes are good for around 20,000
passes, with an archival life of 30+ years.
SLR is another good contender for high-capacity
data storage, with a shelf life of 20+ years.
For large scale storage, using a form of
heirarchical data management would be the best
approach, with MO drives (which have a "mere"
capacity of 5+ GB) serving out files that are
still accessed on a regular basis, and using large
capacity tapes on the backend (such as SLR100 or
AIT-2, each boasting 100GB compressed).
As data warehousing becomes a more important
industry HSM systems will likely integrate auto
migration from media that is reaching the end of
its archival lifecycle.
The obvious answer is... (Score:2)
Re:What value archived data ? (Score:2)
Historians are also interested in the stuff that people did not explicitly choose to keep because it may reveal facts that seemed unimportant at the time, but have been forgotten since.
However, I don't think that means that all our receipts should be converted to more lasting media. We'd drown in our own history.
Re:ummm... online storage banks?!?! (Score:2)
Man - stone tablets are the way to go! (Score:3)
Only garuanteed storage mechanism! Good for thousands of years.
Capacity: 2Kb/tablet
I/O: 1byte/hr
Media cost: £50/tablet
Error rate*: 1 per 100bytes
Note: Error rate assumes fully qualified and certified stone mason.
Re:Just curious, but.... (Score:3)
Don't buy cheap CD-R's (Score:3)
By saying that I'm refering to how I bought my first CD-R about three years ago, and of the 20 or so Maxell disks that I've archived data onto, only one is still readable by any CD-ROM/CD-R that I insert it into. By contrast, every one of the verbatim disks that I've burned, which were stored in exactly the same environment as the Maxell's are fully-readable, and I haven't had any problems with them.
I've also used a few Sony and Memorex disks with which I haven't had any problems (that I'm aware of) but I have found my verbatum disks to be incredibly durable. I burned 20 or so Audio CD's onto verbatim disks two years ago before leaving on a cross-country road trip, and despite vast changes of heat and cold, as well as being literally tossed around my car, every one of those CD's is also still working.
Again, this is just my personal experience, but whenever I see someone picking up a spindle of 50 or so no-name brand disks at a local computer store, I have to wonder how important the data they're putting on there must be...
--Cycon
constant migration and documented formats (Score:3)
As others have pointed out, the exponential increase in storage capacity makes it relatively easy to "keep buying more disk" and migrating your data all the time. Certainly the convenience of having everything online is nice, too. And everything on line should have periodic backups happening. I've managed to do this for the past decade with my data, but I've lost the eight or so years before that, and I miss some it.
But there's logical as well as physical bitrot. The media itself deteriorates, making it hard to get the information back, but understanding what that bitstream represents after a few years can be a real problem. If you've got binary word processor files from an Apple2 or C64, you'll probably not be able the read them unless you also have the binary and can get it running in an emulator. Given the amazing progress that's been made in the last 150 years deciphering the records of dead civilizations, I wouldn't say that reading your MS Word 5 documents will be impossible in twenty years, but it might not be worth the effort. Open standards and open source really help alot with this issue. If you can find a document describing the file format, you're saved. And the same applies to hardware formats. Also, it's much easier to keep open source software alive--essentially carrying the 'make a copy on the new system' over to executables.
I'd say the solution is pretty much that simple: keep track of your data, plan to make a complete copy every 5-10 years, and choose formats and that are publicly documented and that (you hope) will be easy for future software to support.
Re:The ultimate backup (Score:3)
Many of these texts are not yet broadly available in digital form and are not important or interesting enough for enough people to be kept handy. Try looking for some older book by a not so famous author. Even encyclopaedic works are worked over for each new edition and older bits of information have to make place for newer ones.
With historical facts it's even worse, in most cases there's at least two versions of one event and who was in the right is mostly determined by who survived. Just have a look how warfare now concentrates on media control or try to imagine the twisted version of history if the nazis had won WWII, even now there are some denying the existence of the holocaust.
I think all this information is well worth keeping, and since it's difficult to see today what later generations might find worthy the 'evolutionary' approach (if i/we don't want to keep it later generations won't want it either) doesn't work. And it doesn't suffice to just keep this information somewhere, it has to be kept in an accessible form, on media readable with modern equipment (who will go through the trouble reading an old magnet tape) and indexed (if you have 1GB of unsorted texts/textfragments on a harddisk are you ever going to wade throgh that to get that piece of information presently of interest?)
Re:Most of the data becomes useless (Score:3)
I disagree almost entirely.
Very little of the data volume becomes useless, because we don't know what "useless" will be to the readers in the future. Contemporary archaeologists spend much useful time sifting the contents of rubbish pits and latrines - if that turns out to bhe interesting, how can we ever say that data won't be. Maybe your schoolwork is dull and uninteresting to you, but how about an educational historian in a century or so ? Wouldn't you like to know how teaching was carried out in the past ?
Also the majority (by volume) of data will always automatically generated sensor data (humans can't type fast to keep up), and that tends not to become useless with time. NASA have already lost interesting telemetry data.
Authors have definitely lost early book drafts because modern WPs don't open old WP formats. Word 1.0 isn't old ! that's not even a decade ago. What about stuff from the '70s on hardware formats that no longer have players ? CP/M WP formats used by some of the first great novelists to work digitally ? (mind you, losing the whole of Pournelle is fine by me). Personally I'd find it very hard to read my own degree work, and I'd probably have to do it by scanning in the paper copies
Solutions ? I'm not a hardware guy, so I can only talk about the soft data side of it. I think XML (and similar) has a big part to play here. Let's stop thinking of data formats subjectively as "the data format that belongs to SprongWriter 4.2a" and instead work with formats that have objective definitions that extend beyond the client app of the day. Why should I need a copy of that particular WP to open the data, if the data is already in a format that's inherently accessible. We already have the technical skills and tools for this, I call on all developers to make use of them and to stop writing these proprietary data oubliettes.
Book Recommendation: The Clock of the Long Now Stewart Brand Why this sort of thing matters, and what a few people are trying to do about it. Best book I've read this year.
PS - SciAm also had a piece on digital data loss, a year or so back.
Some data points (Score:3)
This is not a new problem. People have been dealing with the question of recovering data from old media for years. As a first data point, a number of years ago, about 5 IIRC, some people finally decided that some old music tapes had to be rescued.
The method used was to find this old RCA gentleman how had retired more than a few years before then. They then went to the Smithsonian and got the last remaining version of the tape recording/play back device that had been used to make the original master tapes. The RCA guy used the specs and his knowledge to tune the tape deck to perfection. They then put a high quality amp and spliter down stream of the tape deck to feed 2 digital tape decks (The professional version, not DAT, more bits and a bit faster sampling rate) and a couple of analog tape decks as well.
After testing, they carefully placed one of the Master tapes on the deck, started all the recorders and press "play". As the Master tape played it just came apart. They had to keep the heads clean but this was a one time, one chance thing. They succeeded.
From the recordings they made some wonderful CDs. Amazingly enough, the Master tape had almost no "hiss" in it.
Data point two. MIT I believe it was, decided to move some of their older theises to CDROM for easier online access. The first thing they noticed is that many of the data tapes they had stored things on were 7 track tapes, and of course they had no 7 track tape drives any more. Again people went to the museums got out a 7 track drive, spent the time to fix it and make it work, then built an interface box to connect it all up and away they went.
3rd data point. Somebody sent out to a mailing list that they were looking for some old code to run on a mulator for a PDP11(?). We ended up going into our machine room and found some old release tapes. This included a copy of BRLUNIX (Based on a BSD release) and I think, an AT&T Sixth addition. These were 9 track reel to reel tapes. We went into the machine room, powered up the tape drive, copied the tapes verbatium to disk. We set it up to do the least amount of reading. These tapes were around 15 or 20 years old.
Because of this rescue which happened late last year, we saved the tape drive when the machine was tossed due to "Inability to prove Y2K compliance". So the tape drive still sits on the machineroom floor. The operators turn it on and clean it once a week. But it isn't currently hooked up to anything, but we expect it to be hooked up to something again in the next year or two. Just to be able to read all those old tapes we still have.
At home I use EXABYTE-8200s for my back ups. I have 3 drives and you can still get them referbished. While each tape only holds 2GB (Compared to a max of 150MB for a 9 track tape). The media is small and low cost. The exabyte encoding also has a great deal of redundency in it making it an exclent choice for long term storage.
At work they do much of their backups EXABYTE 8500s. For the Crays, they use to use IBM 3480 tape cartrages, when they changed tape formats, they spent a few weeks moving all the data from the older format to the new format.
Of course our most reliable storage medium to date has been our paper tape and punch cards. While they maybe low density and sometimes we've had to make readers for them (Auto feed to a flat bed scanner which scanned the card. Process the card for holes and voloa).
CDROMs have the problem of decaying do to light contamination. If you want to keep them for years and years and years, they have to be kept out of sunlight. And because our long term, low cost, storage methods keeps dropping in cost and increasing in size, I suspect that what we will find in 3 years is that everybody is carefully copying all their data from CDROM to DVDs which will have a twenty year life span.
The basic rules on saving your data for the long term are:
Chris
Data availailability (Score:3)
Apart from with MAME, nobody is making any effort to archive old computer games. The BBC managed to destroy a lot of valuable origional video tapes (Apparently they taped over their copy of the moon landings). These show that data is kept around much longer if copying is encouraged rather than discouraged.
Re:Snake Oil (Score:4)
The filter of decay has served mankind well? How illogical, when you have no idea of what has been destroyed how do you know mankind has been served well? Was mankind well served by the destruction of the Library of Alexandria, the Aztec library destroyed by the Spanish, the historical libraries destroyed by the Serbs in the Balkans?
Sure CDs may last 100 years (we really don't know) but it is unlikely they will be able to be read by anything. Paper is still the most stable format available (although it is impractical for many reasons to transfer digital data to paper as some of my colleagues are prone to doing) and there are many vast libraries of data open to the public. We had well over 40,000 researchers use our library last year and less than 1 percent were scholars.
My profession is wrestling with two technology related questions.
1. How to make paper collections accessible electronically. For example the papers of ONE congressman (approx. 400k documents)took 5 years and nearly 3 million dollars to digitize. We have one collection which has 32M documents. Sure digital copies are cheap - IF the original was electronic and in a form easily translated.
2. How to preserve much of the information which currently only exists in electronic form, be it governmental databases, personal computer files or web pages. We did an interesting experiment a couple of years ago when we captured about six dozen web sites which documented the devestating Red River flood in Minnesota, North Dakota and Canada. Most of these sites existed on the internet for only 2-3 months and were disappearing as we captured them. I think it will be possible to study how the internet was used as a tool in response to catastrophe from the governmental level to local churches and organiqations. Of course current copyright law makes it illegal for us to post this database of websites on the internet but thats another issue.
Aging Newbie is correct in the assertion that only a small percentage of data need be preserved, yet I feel that conscious, reasoned choices about what should be saved serves mankind far better than the filter of decay. I also believe tha solution ultimately will involve a combination of strategies including electronic.
Skavvy(whose firewall apparently won't allow him to register)
THIS IS A PROBLEM NOW! (Score:4)
Snake Oil (Score:4)
However, I think he was mistaken. Ancient societies left stone tablets, cave paintings and the like behind, and there's no-one who fully understands the languages or the contexts (when an archaeologist says an object is of "ritual significance" he actually means he doesn't know what it's for). We do have the technology now, as the poster says, to migrate our data ever forwards into new storage, assuming no cataclysm occurs. And even if it does, it is far more important, in terms of recovering data, that the language (source code) survives, rather than CD ROM drives, Minidisc players etc (the binaries), because then data recovery is an essentially straightforward task.
I expect acid-free paper to survive long enough after an ecological catastrophe or, say, a meteor strike, to be useful to the survivors (better start moving the engineering textbooks down into the bunkers). And of course, Ship-It awards will outlast the end of time, not to mention non-biodegradeable shopping bags.
As a civilisation, if we wish to preserve a legacy, we currently posess the skills and technologies to do so - if we choose to.
Re:Snake Oil (Score:4)
Stored properly, writable CD's last 100 years or more while each holds well in excess of an encyclopedia. The problem of preservation is considerably simplified as compared to paper. By 100 years paper documents are of limited utility and only scholars can access them. With digital media, copies are simple and cheap so anyone could have a copy if they wanted.
I think the challenge of the future will be one of sorting the trash; i.e. selecting moon landing data from a mountain of memos, reports, and minutae surrounding it. But, that would seem to have been the problem since history began.
For all of our ego, I think we might have only a few times more real value to save for posterity than did our counterparts at the turn of the century or in the '50s. People seem comfortable with what we saved in the past - why not admit that we are really not that much more advanced and that the real value of our lives and era can be summarized on a few (or a few thousand) CD's a year. Not enough to cause an information apocalypse or anything but a shelf in a library...
CD lifespan (Score:4)
From what I've understood, the lifespan of a CD-R is around 20yr for those which are based on cyanine or AZO (and which appear blue or blue-green when you look at them) and around 100yr for those based on phtalocyanine (which appear golden to the eye).
Of course, it depends very much on the way you treat those CD. If you put one in a light-free, dust-free, safe deposit box, it can probably survive several kyr (uh, thousands of years) without damage.
The unfortunate thing, however, is that because the error correcting codes work so well, it is not always easy to tell that a CD has begun noticeably deteriorating until the data is actually unreadable, and then it is too late. It would be nice if the drives could return some sort of ``CD quality'' status.
I always write down (on paper) the md5 fingerprint of the raw ISO image when I burn a CD. In that way, I can be sure whether I have pristine data yet. (And if I make copies, I can be sure the copy is exactly identical to the original.)
This information is provided in the hope that it will be useful but WITHOUT ANY WARRANTY. Without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Shelf life of recorded CD-R longer than 20 years (Score:4)
Yamaha CD-R site [yamahayst.com]
Josh
The ultimate backup (Score:5)
The internet will always save your best work [google.com] and discard the junk [waldherr.org].
Re:Data availailability (Score:5)
Recovering the data from just a portion of the tapes requires substantial amounts of time and money due to the labor intensive nature of the task. Think of copying 20,000 LP records to CD-R disks.
With limited budgets, NASA and other scientific research agencies are often in the unhappy position of having huge amounts of potentially valuable data on rapidly deteriorating media, of which only a fraction can be saved. Unless someone invents a time machine, the data is irreplaceable.
For many years, magnetic tape has been the medium of choice for storing spacecraft data. Storing it on an on-line system, on disk, just wasn't practical or affordable. Huge amounts of data were archived on 7-track 1/2" digital computer tapes, the same kind of tapes that you see in cheesy science fiction movies from the 1960s. Try to find one of those tape drives today, or a computer that can talk to it.
acid free paper (Score:5)
I'm sorry to hear that. I've been fascinated by this phenomenon in our university library. Up until the 1930's somewhere, journals are pretty well preserved. Then they suddenly get awful as paper mills switched to new methods. Pages are yellowed and brittle. In the 1950's the error was discovered and pages become white again with the switch back to acid-free paper.
Let's hope we don't make the same mistake with digital media. And it could be worse: almost all the film from the first half of the century is lost to self-rot and enviromental damage. For all its faults, DVD is probably the best thing that's ever happened to film from a historical perspective.
Most of the data becomes useless (Score:5)
Modern word processing still opens really old file formats like Windows
Floppy disks are degrading rapidly, but most people's floppy collection can fit on a single CD-R. Then again, most people just don't care about their floppy collection, and will just let it die. The data contained on it isn't useful anymore.
Let's see about Audio CDs. They degrade over time (scratches) and possibly rot. I believe that what will happen is that we're going to convert them to some format like MP3. I'm fairly certain that MP3 capability will continue to be implemented in computer for a very long time.. And if it shows signs of getting phased out, then you might simply batch-convert everything to the new format. Or just rerip your Audio CDs that are sitting in storage, if you really care about the quality (since batch conversion will result in degradation, unless we find a way to actually enhance the audio quality... which might or might not happen...)
Movies. VHS tapes degrade... Probably, we'll be converting what we really want onto some kind of optical disk in the future. And the rest willl decay, and we won't care about it decaying. When the format (DVD-R perhaps ?) is being phased out, since it's in digital format, it should be possible quite easily to simply transfer our DVD-Rs to the higher capacity medium... Perhaps 10 discs on a single one... Saving a lot of space, and having the format live another 20 years. After all, how hard will it be to include MPEG-2 decompression in next generation video players ? The cost of an MPEG-2 decoding circuit probably won't be very high anymore.
The other possibility I see is that bandwith gets cheap enough so that we may consider remote storage vaults. That has a couple of privacy issues I'm certain you can see... But it's incredibly convenient and will probably be adopted by everyone if we just find a way to have a high speed switched pipe to everybody's home at a reasonable cost..
If we do indeed have high bandwith in every house, I see that the media companies might also get their acts together and start putting up their own gigantic media-archive. They could offer a monthly media-license that'd give you access to any music or movie you want. Or perhaps just make you pay for every access to the archive. Of course, such a thing.. I can think of so many ways it could go wrong. What if they decide to have only censored material on the archive ? What about independant artists ? Perhaps we'll just see a protocol to access and pay for access to media archives, and have a dozen appear. Let's say, DisnABCTimeAOL could have theirs, AndoTransmeVAMicrosoChryslerDaimler could have theirs...
This could be so horrible if not properly done - a lot of "non approved" content could suddenly become unavailaible if you killed the distribution channels except those media-archives... So. Is this just an incoherent rant ? Would you care to add any constructive comment to it ? Answers ? Questions ? Anything at all.
An old idea... but still a threat. (Score:5)
In many later books Lem refers to an informatic catastrophe: sometimes it is caused by a necro-virus, a product of a computer evolutions (the arm race was banned from Earth and transported to the Moon, where sophisticated computer systems worked automatically on weapon development. Each nation was allowed to get the weapons back on Earth, but that meant others could equally prepare; somehow, the automata on the Moon get out of control and start evolving, finally leading to a nanobot-virus thriving on silicon chips - therefore the title, "Peace on Earth"), sometimes by basic physical properties (in a humorous story "Prof. A. Donda" the title hero discovers a basic equality between energy, mass *and* information, and one of the consequences is that if information achieves a certain density it changes into matter, that - a new universe. God's word was counting from infinity to zero in an infinitely small time :-) ).
I admit - I was gestaltet by Lem's writing. Many of his ideas from sixties and seventies came to life in the nineties (e.g. virtual reality or sciences which deal only with information retrieval). I do believe that information storage is a problem - but not because the medium would not last forever, but because of the signal / noise ratio you have even in your personal files. As I look on the four Macs we work with in our lab, and the couple of Gigabytes of data, and then dozens of GB of backups, different versions, obsolate versions, alternate versions, gel pictures you have no idea where they came from and who needs them, and so on, and so on... Yes, there are better solutions than using a Macintosh in a multiuser environment, but that's not the point. I've been using Linux for years and have my personal data at home, and I seem to have a GB or so of data I'm to afraid to remove just in case. And there are so many alternatives of storage, backup, databases... and I'm just a simple biologist!
Returning to Lem - yes, I do believe we are approaching a critical point, like a bifurcation in a chaotic equation, and the word "chaotic" fits here in especially well. What happens next? He who cometh and giveth us a system (not OS, but an information retrieval system), he hath the power and our souls. Well, mine at least. Hope he doesn't come from Redmont, though.
Regards,
January