Will There Be Historical Records from the Digital Age?

Posted by Cliff
from the stuff-to-think-about dept.
magarity asks: "NPR's Morning Edition today aired a segment on the Medici Archive Project where every letter sent and received by the ruling Medici family of renaissance-era Italy is being stored. The interviewer, Bob Edwards, casually joked that it was a good thing the Medicis didn't use email or else all this history would have been lost. It is easy to predict that at a similar distance in the future little will be known about our time period. After all, it is already problematic retrieve 25 year old data from 8 inch floppies, simply because the reading mechanisms are hard to find even if the media has retained the data. The same thing will happen to CDs in 50 years. How should the dawn of the digital age be recording itself for history, especially casual correspondence that gives insight into day to day life?"

"The Medici Project concerns itself with the rulers and given the recent report of US Congress members not making use of email one assumes they are still using good old long term archivable paper. Will the President and Congress in 2030 or even 2020 feel the same way? The main problem being digital records are so much more easily tampered with compared to old paper. It's not as easy to do carbon dating or other such tests with a bunch of bits. Remember: the victors always have and always will rewrite history as much as possible."

Will There Be Historical Records from the Digital Age?

  • by Anonymous Coward
    What may be the most difficult part of the problem isn't the long term storage, but conveying what's stored.

    Think about Egyptian culture. We wouldn't have a clue without the Rosetta stone. It wasn't enough that they left writing and markings that have lasted thousands of years. We needed a tablet with the same message in several messages to figure out what they were trying to say.

    So what you really want in your storage is a long term package, no moving parts or power supply, some generic and easily understood interface, and a primer that cannot be misunderstood.

    Also, for those thinking we can just have plain ascii text, it's not that simple. Ascii is an encoding scheme. You have to have something in the primer to tell the reader how to decode the data and then what those letters and words mean, and so forth. In 2000 years we invented Latin, French, German, English, but modern German speakers would find Old High German hard to comprehend.

    This gets worse as time goes on. It's already hard to explain feudalism to people, try explaining the Roman Republic's governmental structure. Now, try explaining American Democracy in 500 years.

    It's not just the media, it's the culture. And a primer is how you get them able to follow enough of the conversation to get a grip on it.
  • by Anonymous Coward
    Making copies of data, even for historical preservation, without permission of the copyright holder is illegal unde the DMCA. You THIEVES!
  • by Chris Johnson (580) on Tuesday April 10, 2001 @12:49PM (#301057) Homepage Journal
    Of course there will be historical records from the Digital Age!

    They will say:

    • music thieves are like looters or other sorts of robbers, and right thinking people despise them
    • nobody has ever been motivated by anything other than self interest
    • people will trade off privacy for a bit of convenience
    • Microsoft has always been the world's web browser
    • Bush won
    • Oceania has always been at war with Eurasia

    Thank you, Ministry of Historical Perspective! :P
  • I would think so. Yes there is a lot of stuff going on on the net that no one cares about now and no one will care about in 50 years. On the other hand we have most of the letters people like Washington and Jefferson wrote, because they made personal copies in a diary before they sent them (which made sense in a day and age when letters might not get there). And they are of great intrest to many people. And there are many other records from that period and before including a very complete set of Several hundred years of the Cairo Jewish community in the middle ages that was found about 100 years ago. That one existed because Jewish law requires some written records (those containing G-d's name) to be stored or disposed of properly. And the community just got into the habbit of saving everything. Its literaly hundreds of volumes of stuff.

    In 50 or 100 or even 500 years will historians be able to access what we have done today? I hope so but I don't really know.
  • Well Limiting the number of formats that you accept has the major advantage that will not have problems that in 100 years people will not be able to read it. The other bad side if ASCII is that it will only do English text, If you want to archive a document in Greek, Hebrew, Yiddish, German, Russian or Chinese or whatever you can't do that with 7 bit ascii.
  • by iabervon (1971) on Tuesday April 10, 2001 @10:12AM (#301060) Homepage Journal
    We just have more medium-term storage. The sorts of things that won't last more than a couple dozen years are generally things which, in the old days, wouldn't have lasted a minute: music couldn't be stored at all until recently, and many conversations we have by email (which could degrade) would have been done in person and never stored at all.
  • Stewart Brand addresses this issue on the Longnow website:

  • I have posted it in several discussions on Slashdot, that Donald E. Knuth's TeX typesetting system was not only intended to create high quality typeset mathematics, but that Knuth's deeper reason was to preserve his work in a high quality format for the ages to come.

    This is no coincidence, because Knuth's main oeuvre, a several volume work on computer science, has already a related aspect:

    Computer science changes very fast and Knuth decided to include just those parts of computer science that have settled and that might have reached a maturity that would make them unlikely to get radically changed in the future. Hard task. And indeed that stuff he put into his three released volumes is highly mathematical, because such stuff is typically evolved enough, but still he did not really manage it, so the RISC architecture for example pushed him to update his machine language MIX.

    At some point, when Knuth got some copies of his TAOCP, he was frustrated enough because of the typographic quality getting worse. So he decided to take some time off to develop a system that turned out into TeX (who else than a professor can take 10 years sabattical to do such :-)

    To shorten the story:

    Knuth developped TeX, the programm that assembles boxes into lines, lines into pages, pages into documents. Developped Metafont, the programm that takes the mathematical description of font families (= a meta font) and renders them into bitmaps. He developped the computer modern fonts in Metafont format. Plus he invented a system called literate programming, that allowed to derive programming code and documentation from the sources.

    All this, has been released in form of five books:

    • TeX manual
    • literate/commented TeX source
    • Metafont manual
    • literate/commented Metafont source
    • literate/commented Computer Modern font sources

    This means, that even in hundered of years, everyone with those 5 books, something like a computer, and the ability to read mathematical texts plus the computer science knowledge to implement a Pascal like language, will be able to reconstructs Knuth's whole system!!!

    If at that point .tex sources are available (at least as printed listings!), they will be able to hack device drivers for their then common output devices and to be able to print all of Knuths works in original typographical quality!

    That is real deep reason for Knuth's TeX - longevity of information.

  • Embossed metal would be good.

    No better. Metal gets corroded by water (worse yet: saline water), melted by fire, cracked by cold etc.

    Besides rock, which has proven pretty good throughout the ages, there's one thing that could hold up the promise, and that's mineral paper []. (Aka, asbestos paper.)

  • Have you ever heard of gold? Not to mention titanium, hafnium, rhodium, platinum, nickel, chromium?

    Hardly affordable metals aren't they? I'm talking something remotelly accessible, not gold-plated disks to be sent outter space...

    Where YOU awake in economics class?

  • Orwell was a well-known member of the U.K. socialist party if memory serves.

    Doubleplusungood! Thought Police! Here! I have found a crimethinker! He must be an agent of Emmanuel Goldstein, spreading misinformation!
    Put Doctor K with his brother in the Castle!
  • Nowadays it seems that it's the place where artistic (or allegedly artistic) works used to go. Don't look for Mickey Mouse to show up there any century soon.
  • Digital rot of our records, I mean.

    Think about it -- what do we have to pass on to future generations of the past 20-30 years? Boy George, N'Sync, Lyndon LaRouche, Hare Krishnas, Monica Lewinsky, Rush Limbaugh, Al Gore, Rob "CmdrTaco" Malda...

    It might be a good idea for ALL these things to slowly melt away ...
  • by desslok (7863) on Tuesday April 10, 2001 @10:06AM (#301073) Homepage
    cat internet | lpr
  • I'm sure the presidential libraries and stuff about important famous people, the Medici of the digital age, will continue to be well preserved - at least that part that they want to be remembered for - but a vast majority of information, 98% probably, isn't worth the trouble of saving.

    Currently I'm about to pick up a used Super-8 projector to show some films that are in great shape.
    Also just got a 1930's Burroughs adding machine for $15 from a hamfest that, with a few drops of oil and cleaning is in 'like new' condition and will probably be in working condition hundreds of years from now if kept in the right environment (room temp, low light and humidity - basements, attics, garages and sheds are hell on that stuff).

  • The accounting ledger is only of interest today because it is largely all that survives of the culture. You have to be careful when making assumptions about older societies based on a handful of spotty records. If all you can find are commercial records, it far too easy to assume that commerce was the most important thing in people's lives when it very well may not have been.

    I'm not worried about what records will survive and won't survive from our era. The Romans, the Greeks, they didn't worry about such things. They worried about what legacy they would leave for the future (fat lot of good it did them) which is what kind of world they were leaving for their children. This is far more important, IMHO.
  • All of our digital archives are deteriorating at a rate unparalled since the introduction of acid-based paper.

    If its not the medium (read an 8" diskette lately? How about a 14" 5MB cartridge? How about a reel of mag tape?) its the software (M$ Word documents formats were deliberately sabotaged to force people to migrate to the newer versions. [I don't know anyone who actually needed M$ Word '97 until they found that they had to upgrade when M$'s biggest clients who'd got their copies for dirt.])

    There will be thousand year old documents and last week's flimsies and nothing in between. Just an Orwellian silent testimony to greed and obsolence planned and otherwise.

    But that said. have we said or written down anything worth keeping?
  • Use modern circuit etching technology on long-lived media such as corrosion resistant metal.
    Etch text, not binary codes.
    The future can read this with a computer or magnifying glass.
  • by peter303 (12292) on Tuesday April 10, 2001 @10:03AM (#301079)
    That applies to 5 years ago or 2000 years ago.
    Even paper distintigrates, albeit in centuries.
    Only a tiny fraction of stuff is copied now or then.
  • Everything has value to someone at some time. I have the sick habit of collecting the Internet (custom spiders suck large parts of the web and usenet onto my harddisks) just for the heck of sorting through it to see what I find. In 100 years my hdd full of odds and ends could be a great find for some historial researcher. I'd disagree with the original poster though. Our culture will be better documented than any culture before us. We're an information culture and we leave our data all over the place. Someone that digs up a stack of cd's would have a huge collection of multimedia information and all they'd have to do is figure out how to read the discs (which is referenced in other documents both printed and digitally.. so there is a key). Sure it's important to keep copies of disks, email, music, etc.. despite lame IP claims.. for historical reasons but this is fairly easy to do. Copy the other sources into raw data files (iso images, cd rips, game roms, etc) and copy the files around as much as possible. Email and other personal files which are quickly deleted or may be encrypted may be the hardest data to save.. but the large amount of email that ends up cached or forgotten all over the place would still probably exist and by that time I expect the future culture to have the computing power to easily decrypt our files.
  • by joshv (13017) on Tuesday April 10, 2001 @11:35AM (#301082)
    We need to define a long term storage standard which is a suite of storage media and standard file formats. Call it LTSS 1.0. To be a LTSS 1.0 compliant reader you have to support all media and file formats. This could be a dedicated reader, or a general computer with some specialized software and hardware.

    LTSS 2.0 might have whizbang new file formats and storage media which supports 100 times as much information density, but it must be compatible with version 1.0.

    LTSS 1.0 could support WAV, MP3, GIF, TIFF, Text/ASCII, Text/Unicode, HTML version whatever, and perhaps even Java for interpretation of abirtrary file formats. The media, CD-R, or perhaps one of the writeable DVD formats when they mature.

  • Of course, it would be pretty stupid to assume that altering of documents by politicians or other people is anything new. This sort of thing has been going since the beginning of time.

    Take the Ems telegram for example, seriously altered, and sparked a war between France and what would become Germany. Of course, we now know that it was altered, but at the time no one knew what happened.

    If you think that people altering documents for their own good is anything new and ruins good historical records, you need to wise up and take a history class. This is nothing new, and we still have a good idea of what happened.

  • The way to keep data long term is to form long-lasting institutions (like libraries, for example) whos purpose it is to perpetuate knowledge and information. Within the Earth, you can't see any medium as being eternal, so you have to create a social construct that will perpetuate the data, across media and societal changes. A good example of this is the Bible. The original 'Bible' is long gone, but it's one of the most solid pieces of historical data because there is a social constuct, Christianity, that has a primary tenant of keeping that word alive. This isn't a religious rant, but just an example of ways to archive data beyond the lifespan of any given medium.
  • Though we're happy living here, the Earth is highly corrosive and chaotic. We don't see it because it happens in slow motion (by our perspective) but everything's getting worn away, oxidised, bleached, or otherwise transformed by chemical reactions.

    If we want to save data we need to make redundant copies, in a form that is resistant to electromagnetic radiation (say, microetched in carbon, silicon, or other stable element), and put it into a heliocentric orbit 1 radii behind or ahead of the earth's orbit (this way it's not in a trojan point, which could result in collision damage, but is still in a 'mathematically likely' place).

    Most of the corrosive factors would be left behind on Earth, and the data would be stable for the long haul. Alternatively, we could put data on the moon, where it would be stable until a meteor hit it or covered it up, likely tens or hundreds of millions of years, and if we put several down, they'd last longer.

    Hmm, maybe a big micro-etched monolith buried just under the surface...

  • I think the ICQ logs from efront are a very important historical record. Even if most of it is inane, it provides an uncommonly frank and unclouded view of a crashing internet company. Some of the most valuable historical records _are_ the inane letters sent from person A to person B. How about the Diary of Anne Frank? A thirteen-year-old's AIM chats are one of the most important works of the century!
  • by Moofie (22272) <lee&ringofsaturn,com> on Tuesday April 10, 2001 @01:24PM (#301094) Homepage
    What are you talking about? I've got the DVD that Moses brought down from Mt. Sinai. Look! It says "10 Commandments" right there on the front!

  • For good examples of similar thinking, check out Danny Hillis' 10,000 year clock project. The first thing he did was toss out all "modern" technology because none of it would last as long as he needs it to. He had to go back to the Bronze Age, I think?
  • For a virtual world we ought to separate the infos from the media. We could store data and execute programs some computers and use the majority result. See Askemos [] how this will work.

    Once we are at it, we might find that files are worse than paper for another reason. We better had "write once" files. - If reusable paper were better that nomal paper, we would have it in the stores. Enough cycles of invention went over it already.

  • by Shotgun (30919) on Tuesday April 10, 2001 @10:16AM (#301099)
    A democracy, a so called 'free society', can easily be manipulated and controlled by the person controlling the information. What happens when all information, except what comes from 'authorities' is suspect because it is so easily fabricated?

    It reminds me of the Arnold Swarzen...(?) movie, "The Running Man". He's a police helicopter pilot who refuses to shoot unarmed people involved in a food riot. The powers that be manipulate the video tape evidence to make it appear that he massacres the people instead. People are shown the tape and cry for his death in a game show type fashion until some revolutionaries are able to show the real tape by hacking into the communications channel.

    The temporality of public records has very serious implications for our social structure. If the only record of your speeding ticket is an entry in a database, what happens when a glitch makes you a drunken sloth who doesn't pay child support. If the entry showing Bush's drug convictions get deleted, will there be no other record. Trust me on this, email is a politician's dream. Everything from here on has plausible deniability.

  • Historians may not be specifically interested in you, no, but what about your decendants?
    The day-to-day information that we produce is the stuff that makes genealogists go nuts. It's the stuff that leads to books like "Roots". Biographies of people who, to themselves, seemingly did nothing with their lives, yet looking back ath them a hundred years later we see how extraordinary they were.
    Should -everything- be saved? No. Personal correspondance with friends and family should. (and hell, I have -every- piece of email that I've received at work over the last year saved. Talking roughly 500MB or so of gzipped archives (which balloon to about 1.5G)).
  • Although a smaller fraction of the data produced today will be readable in the future, there's so much more data produced that you wouldn't want to read much of it anyway. The fraction of it that's produced on long-lasting media like acid-free paper is still quite a lot.
  • I personally don't feel the need to copy any of my old floppies. All that I ever had on floppies and that mattered to me is now somewhere on my current hard disk (and a few past ones). All of it takes only a fraction of my 18GB drive. Assume I had 100 floppies that mattered: that's less than 200MB, which you can copy in a few seconds on modern digital media.

    As a matter of fact, each time I get a new computer, I copy all the stuff from the old one, and it takes only a fraction of the space. The 40MB of my first (Atari ST) hard disk are there. The 160MB of my first Mac hard disk (120MB left after I copied the Atari hard disk onto it) are there. And so on.

    The real issue is binary formats that have been forgotten. For instance, I have source code of programs I wrote in GFA Basic (a Basic for the Atari ST, in case you wonder.) But emulators come to the rescue there. Today, I can run Atari programs faster than on the real machine.

  • Ok. I am not a history scholar, but I have occasionally worked with different archives during the past years; with the Danish State archives, and The Berlin Document Center (has a new name now).
    The amount of information (archives) that a state amasses, is simply astounding, and thats just the the bits that goes into the archives; at least 90% of all paperwork is scrapped even before that.
    An example; I helped a scholar do some statistics on black market crime during and after the the war;
    He examined a single, lower court, in the period from 1940-1953. "Only" 8000 cases went through this court, but just the verdicts alone, averaging 3 pages per case, amounted to 25000 pages, bound into fifty, 500 page tomes. Each of these cases, would also have generated a "file", containing eg. police interrogations, wiretapping records, anonymous letters, forensic evidence, case evidence, court orders, affadavits, etc. A really conservative estimate would be, that each case, would have generated at least 20-40 pages, meaning that just this single court, in a few years, could have archived 100.000 - 200.000 pages. A totally impractical thing to do. Therefore these files were "cleansed", before being archived.

    If all those papers that public institutions produces were preserved, we would be swamped in archives. Some stuff simply has to go.
    Old-style paper archives has physical storage problem. Modern "bit-based" archives should in theory, be less burdenend by this. (200.000 pages should fit handily into a single cdrom.)

    But on the other hand, modern information systems makes it so much easier to generate, and preserve information. (just think of many gigabytes of information a single company has on its servers)
    How many emails is sent every day? 5-10-20 millions? If just a fraction of these, say 100.000, were preserved every day, think how many freaking million emails that would be during a short period of 50 years. But more importantly, how many (and which) emails would posterity need, to say something about our time, and the social pattern behind the phenomenon; email?

    The main problem with digital archives, is the same as with paper archives; You can't, and shouldn't try to preserve everything.

    I don't doubt, that over time, even the majority of that information selected to be preserved, will be lost, due to bit-rot, war, fire, carelesness, natural disasters etc, during the next 1000 years. But even if just, a tiny, tiny, fraction of this is preserved, there would be "enough" information, about our time, for the historians to make a good overall picture.

    A single, modern "Statistical Yearbook", probably contains more demographic information, than all medieval archives put together.
    A modern public library, probably contains more works, and written information about the last 100 years, than have been preserved, from when man began to write, until the Middle Age. Still, a lot can be said about the Roman Empire, even though so precious little in writing has survived.

    So to reduce future archives to a manageble size, the majority of information simply has to be discarded. Then it is more likely, that there will be funding, for preserving the rest in a proper way.

    Consider the amount of time, money, blood, sweat and technology, that goes into carefully extract scrolls from the Pompeii site, and make them readeable, it should be a "trivial" task to recover any kind of non-encryptet data, no matter what digital media it resides on. However, the cost of doing so may not be trivial. Just think on how many data formats, future historians would need to reverse engineer, just to cover this last decade.

    "Remember: the victors always have and always will rewrite history as much as possible."

    How I have come to loathe this dogma.
    Originally, it stems from the fact, that sometimes only one parts "history" survived from ancient times until today (Athens, Ancient Egypt springs to mind).
    But the dogma really isn't true anymore; First, in democtratic countries, it is impossible for the state to directly control, what history is written. Secondly, after having dealt with the massive "memory" rewrites among former Waffen-SS soldiers, I can only conclude, that the loosers are just as eager as the winners to rewrite history; there has been a huge amount of revisionist "history books" written since the 2. WW. ended. From outright holocaust denial, to apologeic "Waffen-SS coffee-table books", where the W-SS soldiers are portraied as just a bunch of happy, anti-communistic boy-scouts, on a picnick in the USSR. Noone of them were ever nazis, or anti-semetic, they never saw any warcrimes (except those the russions made), blah, blah, blah. Total denial of facts.

    So a better dogma would be:

    "Remember: both the winners and looser always have and always will rewrite history as much as possible."

    Historians know this of course.
  • Assume I had 100 floppies that mattered: that's less than 200MB, which you can copy in a few seconds on modern digital media.

    It would take a few seconds to copy the equivalent amount of data stored on 100 floppies but it wouldn't take a few seconds to copy 100 floppies. The distinction is important for archivists, who might have, say, a building full of 9 track tapes to convert, a process which could take years.
  • He's a republican he does need facts. If Rush says so it must be true.
  • I opened up each message. Cliked "Save as" and saved as .eml. It was a bitch. I think there's a pst2eml perl script out there somewhere. Or maybe mbx2eml?
  • by wiredog (43288) on Tuesday April 10, 2001 @10:14AM (#301114) Journal
    When the 3.5 inch floppy came out, I copied all my stuff on 5 inchers over. When CDR came out, I copied it all onto a cd. Made backups, too. Copied all my e-mail from outlook to the standard text format when I went to Linux. No doubt I will be copying my data to DVD-R someday. And, 20-30 years from now, to its successor.

    One problem with archiving digital communications is the volume. One of the problems that were found during the many Clinton investigations was, when e-mail was subpoenaed, separating the wheat from the chaff. All the mail was backed up onto tapes, which weren't very well marked. And the first searches were done on subject lines. Quite a bit of relevant mail was missed, and turned up years later when people actually sat down and read every message.

    The National Archives (here in the USA) is worried about preserving data. The various software and hardware formats used over the years make it difficult to track and retrieve the data. NASA has spent a fair amount of money moving old planetary exploration data from tapes to optical disks, and then to CD. My father worked on a project at DMA (now NIMA) to do the same thing there.

  • by SecretAsianMan (45389) on Tuesday April 10, 2001 @01:36PM (#301116) Homepage
    The main problem being digital records are so much more easily tampered with compared to old paper

    Sometimes the answer to your question about how do we do X with technology can be found by remembering the history of technology. In this case, what might be a better long-term storage medium than magnetic or optical media is good 'ole paper tape. Now, some research should probably be done to increase both the durability of the tape material and the density of information stored on it, but it is the best solution I can think of, and probably the easiest to decipher by archaeologists of the far future.

  • I'm sure I remember discussion about "programmer archaeologists" of the future - noble beings equipped with trowels and oscilloscopes, who reconstruct long-dead file formats from half-corroded CDs.

    There's actually some of this going on today.

    I'm a bit fuzzy on details, but a few years ago I heard (from someone who worked in the field) about a project to resurrect old LANDSAT tapes from the 1970s. Someone figured out that the old data would make a great baseline for climate change studies, and the raw data could be processed in ways simply not possible 25 years ago.

    The tapes were still around, stuck in a warehouse somewhere., To get them into a readable condition, they had to be slowly baked (in pizza ovens!) to drive out moisture they had absorbed, then scraped with a sapphire blade to...well, I forget why. Scrape off some gunk.

    I believe they managed to dig some old recorders out of the scrapheap and get them working with the help of some old hands.

    Wish I could recall more details, but that's all I know.

  • Ever wonder what they do with all those communications? Maybe they can put them in escrow for 200 years :)

  • Don't steal this idea because I'm going to patent it and make lots of money, but here it is:

    Everything2 is great for recording encyclopedic sort of knownledge. What I'd really like to see is something that is designed just like Everything2, but instead it records *experiences*. Everybody writes experience and event nodes, and eventually we have a living history of everything that ever happened. Sure a lot of that will be irrelevant, but just think of all the correlations and connections that could be made. Sort of like 6 degrees of separation, but for real life events.
  • by devphil (51341) on Tuesday April 10, 2001 @10:28AM (#301120) Homepage
    One problem with archiving digital communications is the volume. One of the problems that were found during the many Clinton investigations was, when e-mail was subpoenaed, separating the wheat from the chaff.

    No kidding. I'd hate to be in Deja/Google/whoever's shoes, trying to archive useful data, in face of terabytes of "Nude Asian Teens" email generated -- literally -- completely automatically at the click of a mouse button. Especially since the most useful spam filtering methods (outright router blocks, keyword triggers, a bullet to the head of the marketing agent) are frowned upon by nice people.

    Paper libraries have a "volume" problem because the media itself takes up so much space, and must be carefully stored. Digital libraries have a "volume" problem because any old jackass can easily create fifty times the amount of information that's worth keeping, and it must be winnowed out by a human.

    Just my rant today (cleaning out another twelve spam emails).

  • At a personal level, I am currently denied access to email of my own from as little as 5 years ago. I would save it into files periodically, on whatever shell account I used at the time. But periodically there are non recoverable file system errors, or shell accounts that just disappear in the dead of night (we'll see alot more of this if the ISP burnout rate continues.)

    So forget this problem of losing our digital records as a society, what about losing my personal identity?

    I still go back and look at physical letters of mine from 10+ years ago, but email from as recent as 1994 is hard to find. That frightens me, frankly.

  • Data that is easily destroyed goes hand in hand with data that is easily copied. I think data loss will always be more prevalent with digital media than it was with more conventional ones.
  • ...or at least as long as active and caring human society - are no problem.

    But you have to get away from the mindset that seeks a "wearever" medium, everlasting standards, and indefinitely available hardware. That is the naive approach.

    The word is "living archives". The archivists' work is never done.

    The approach that works is just to regenerate all data from media that is wearing out, obsolescent media, and obsolescent standards - before it is in danger of being lost. This must be a constant process of renewal. Since the data is digital, and anyone with the slightest imagination would store redundant copies in physically separated locations, the process is lossless.

    So when 3.5" diskettes become well established, and 5.25" diskettes start looking like orphans, you redub everything from 5.25" to 3.5". Then the same thing when CDs overtake 3.5" diskettes. And on and on (I seriously doubt CDs are forever in any sense of the word).

    The trick is to know when the time is right each time. I won't minimize the problem. But the watchword is "be conservative".
  • It is true that we haven't learned everything we'll learn, or got close to being able to do everything we'll be able do. We are in a unique position now though, in the sense that for the first time, we can provide convincing sketch answers to most of the interesting questions about the universe. I recommend David Deutsch's book The Fabric of Reality on the subject...

    Theoretical computer science does tell us some things which appear to be absolute. One of those is that "information is information is information".

    The big difference between a standardised digital archive and microfiche is that the former is pure information. It will be automatically convertible to more sophisticated forms of storage in the future. Digitising microfiche archives is possible, but still requires lots of physical work which is only partly automatable.

    Having said that, it is also possible to identify a big limitation in the proposed "LTSS" - the same thing which is missing from the web today - rich metadata (this isn't just about XML, btw). Do a search on the "semantic web" if you're curious....

  • Twenty years or so ago, the Smithsonian museum had an exhibit about fiber optics that included a working model of Alexander Bell's "light phone" (it mechanically modulated a beam of sunlight) and his original lab notebook (borrowed from Bell Labs' engineering records). The notebook was still legible because (a) the paper was acid-free and (b) the ink was pigment-based. Even though I keep a notebook, it will not be legible in 100 years (perhaps one of my great grandchildren will be interested) because either (a) the high-acid paper will have decomposed or (b) the parts written with dye-based ink will have faded.

    The fairly recent PBS documentary on the US Civil War was based in large part on letters and journals written by soldiers using (you guessed it!) acid-free paper and pigment-based ink.

    Make tomorrow's history! Write letters and keep journals using acid-free paper and pigment-based ink -- if it's all that survives, it will be the authoritative material on the typical daily life!

  • Engraved Nickel

    The Rosetta Disk []


  • The Medici project has experts working with fragile, hundreds-of-years-old paper documents. It is conceivable that in the future, there will be similiar experts who have special tools and procedures for reading ancient media like CD's. However, IIRC, the lifespan of optical media like CD's is about 100 years. Perhaps future technology will be able to extract data from partially degraded CD's. Historians have always faced challenges in finding data that have been worn away by time. Future historians will be no different.
  • There was some talk on another thread about how long CD's would last. Audio CD's, and infact all cd's that are 'pressed' (IE not CD-R's and CD-RW's) should last a very long time. These disks are NOT subject to 'laser rot'. Laser rot was what happened to early 12" laser video disks. Laser disks are two sided, and are made in the same way as audio cd's in that the information is hot pressed onto the plastic, and then aluminum is vacuum deposted onto the plastic to make it reflective. Two of these disks are then glued together. What was happening was that the glue was attacking the aluminum and mosture was getting inbetween the disks. Better glue formulas have mostly solved this problem. Audio and computer CD's that are factory pressed are single sided. The aluminum is protected by a coating of varnish which serves as the label. As long as this is not scratched the aluminum layer will remain intact and the data can be read. It might be possible to restore a damaged disk by stripping off the varnish and aluminum and vacuum deposting a new layer of aluminum. Not something you can do at home though. DVD's consist of two or four disks sandwitched together, they might have laser rot problems if the glue isn't good....

    CD-R's and CD-RW disks record via a dye that changes color and reflectivity with heat from the laser. This dye can destablise under light and heat. So keeping your CD-R's and CD-RW's in a dark cool place would be a good idea. Also the more they are 'played' the shorter their lifespan might be. So make a backup copy of any CD-R/RW you want to keep. CD-R's might be more stable than CD-RW's.
  • There's a good review [] of a Nicholson Baker rant against Librarians in general for their sins of deliberately pulping the paper records of the past 130 years and replacing them with decomposing and badly executed microfilm facsimiles.

    It seems that Vannevar Bush's infatuation with microfilm was shared by many in the WW2 OSS community, and this seems to have led to a misguided attempt to replace papers and books with microfilm in the interests of "efficiency".

  • Oh man. I don't think degredation of media even comes into play sometimes. Have you ever tried to find a story from yesterday's paper on your local newspaper's website? A lot of times stuff just gets cycled out the next day.

    Of course, the NYTimes, etc, have archive searches as a premium service, but there are just tons of media outlets that don't seem to archive, or if they do, don't seem concerned with letting people get at it. This seems like at least as much of a concern as degrading media: the organization and maintainence of archives in the FIRST place.

  • This topic is one that is already being seriously considered by librarians and historians.

    The USA's Library of Congress [] Preservation Reformatting Division [] is digitizing many items for preservation, and you can be sure that they're concerned that the digital preservation will be at least as effective as the original (analog, paper, whatever) form.

    One of the current projects of the Research Libraries Group [] is data preservation []. The RLG is an international group formed originally by Columbia, Harvard, and Yale universities and The New York Public Library in 1975, with current members from academia, government archives, public and private sector historical organizations.

    A google [] search on digital data preservation [] gives plenty more linkage to groups actively looking at the issues involved in digital storage.

    Of course, there is still a huge volume of personal and corporate data that will no doubt degrade to dust. For that, we all need to take the approach [] of wiredog to keep our personal data accessible by refreshing the media as technology advances.

    Naturally, since this is Slashdot, all of this has been already covered []. This article [] was a particularly good treatment of the topic and was posted as a followup to an older [] Ask Slashdot.

    Really, how different will it be if the future only has the preserved personal effects and communications of an insignificant fraction of the general population? Today, archeologists make a career out of extrapolating whole civilizations out of building foundations and shards of pottery.

    So, with a little care, I'm confident that my own data will be happily accessible as long as I need it. After that, the future will take care of itself.
  • by spasm (79260) on Tuesday April 10, 2001 @10:52AM (#301137) Homepage
    "Important information survives (usually). Trivial information gets lost. This is how it should be. There's no reason to preserve every bit of data for 'historical' reasons."

    I've worked on research projects whose primary source was day-to-day accounting records of a small business running in Egypt during the 11th century. The records were preserved in part because they were at the bottom of a trash pile. The records gave us a huge amount of information about everything from transport methods to the ability of the state to collect tax. Most of the 'important information' from that period which people though was worth preserving revolves around which ruler stomped which other ruler's butt. Our 'trivial information' gave us a lot of stuff which we knew nothing about before, stuff which helped explain why ruler X had the economic wherewithall to stomp ruler Y's butt and, well, more interestingly, what it was like to live under ruler X or Y.

    The same applies today. Yeah, a record of what your family ate for dinner for the past two weeks is truly trivial. But what it will say about daily life, the transport of food, diet, cooking technology, food storage & a whole lot more about life in the early 21st century might be invaluable to some historian in a thousand years.

    Your 'trivial information' is someone elses data goldmine and vice versa. One of the things I really like about computers is they allow you to keep a lot of personal shit you might otherwise have to trash because it gets bulky. The chances that I'll hang onto all my mail & all my parent's mail and all my grandparents mail is pretty good when it fits onto a CD rather than choking up my small apartment with boxes. The chances that some future historian will get to read ordinary everyday mail rather than just the mail of presidents and kings in a thousand years is getting better.
  • ... but someone's grocery ist is not really of historical value.

    I beg your pardon, but when I was doing research on the dietary habits in Early Modern France, someone's grocery list would have been of extreme historical value! Luckily we do have some petitions for aid written to city authorities in which the petitioners detail the household consumption of bread wine etc ...

  • I hate to trivialize this and become just another /. naysayer, but if it's that important they can build a cd-rom drive.
  • by supabeast! (84658) on Tuesday April 10, 2001 @10:26AM (#301141)
    Optical media is not really such a bad option. A useful, self contained system for playback of optical media could be easily built. If nothing else, carefully preserved schematics for future readers of media could be store with it to make sure that if the machine is ruined and media survives, it might still be read.

    The real reason that old magnetic tape is hard to read now is that it was never a great format in the first place. The stuff falls apart. My last employer had an old HP reel-to-reel machine for reading data on tapes from a company we had purchased, but the tapes were so old that the chemicals on the tape itself turned to dust and fell off. This is not a problem with optical storage. Optical storage also has the option of being dedicated in very small spaces, unlike the van sized tape players of old.

    Life is also not a big issue with optical media, because just as the books of the Medici's were recopied over and over into new languages and on better bindings, so can data be quickly copied from old optical media onto newer formats.
  • by smoondog (85133) on Tuesday April 10, 2001 @10:13AM (#301143)
    I'm at a loss to understand why this question is perceived as being difficult to answer. Notice the posting talked of the *ruling* class. Today we look back at history and see people who kept records of their letters. They are usually wealthy and upper class.

    The analogy would be to read emails from, say, the white house in 200 years. Do you think the white house is saving their emails? You bet. Do we have lots of examples of (from the general public) letters from 200 years ago? Certainly not as many as there will be emails in the future. Usenet archives, digital backups stored in basements, most emails are being stored two or more times at two or more places. I don't quite understand why someone would think that just because it isn't on paper, it isn't going to keep. We are going to have far more emails stored in the future than we will know what to do with.

    As society we think of ourselves as individuals to be pretty important, but lets face it, for the vast majority of us, no one is going to care in 150 years. With that in mind, the digital age is storing far more records than ever before and the future holds a new paradigm of historical record. I almost lament that I wasn't born 150 years after the advent of the digital age where high resolution movies will look as good 1000 years from now as they do today.

  • by Greyfox (87712) on Tuesday April 10, 2001 @10:39AM (#301147) Homepage Journal
    This problem is aggrivated by the current copyright laws. Long after the copyright holder's lost interest, it will be illegal to copy the content to fresh media. Lars may bitch and moan now about his songs being stolen but in 100 years will anyone know who his band is or hear his songs again? The DMCA will only make this problem worse, potentially making it impossible to preserve any works from this era.

    Likewise, various people are trying to shut down the MAME ROM sites, but a lot of the hardware ROMs are deteriorating now and many of those games, which represent a golden age of creativity and a technical wonder of resource usage, will be gone forever. Kinda makes you sick, doesn't it?

  • A truly wonderful example of this kind of thing are the early works of JRR Tolkein. The early history of the Silmarillion is absolutely fascinating and a wonderful example of the development of a literary theme. That's a work that wasn't published for over 50 years after it was started, but some of the earliest drafts still exist. Because those drafts are available, it's possible to see how it developed. Will the same thing happen when authors write everything in Word and write over old versions every time they change anything? How about if they're still very careful about keeping copies of early drafts but the formats change so much that they can't be read anymore?

    Enter VMS, which automatically saves every version of a file, until you manually delete them. If Unix had not wiped out VMS, everybody would have every file they ever worked on.

    Word actually does have a versioning feature which saves every version you worked on if you enable it.

    My OS is going to have infinite versioning and journalling capabilities, so you can undo any change you ever made (not just on "file save" boundaries). When VMS was developed the typical hard drive was under 100 MB, and now that they sell 100 GB drives for a dime a dozen, we have the room to save everything. Why current OS have usage models which encourage people to delete everything is beyond me.
  • by zpengo (99887) on Tuesday April 10, 2001 @10:13AM (#301157) Homepage

    While I'm all for archiving data for future historical analysis, I think it's fairly certain that IM logs, "how's it goin?" e-mails, and detailed transcripts of #40yearoldsinglebaldguys will not be very useful to historians in three hundred years. Yes, they tell about our culture and practices, and yes they might be interesting, but we don't need all of it to extrapolate those conclusions. There is simply no room to store the vast quantity of information generated on the Internet on a daily basis, and considering the fact that 99.998% of it is of little value, I think that we can safely do without it.

    Things are still floating around from the old days. We have Usenet archives from the 80s, and text files from even earlier. We can learn a lot about the culture based on those. Things that grab the public consciousness tend to around. They get mirrored, printed out, saved on disk, etc.

    Does there need to be a giant warehouse that contains vacuum-sealed printouts of every wise thing said on the internet?

    No. No, there doesn't.

  • The bigger question isn't media, but sofware. I'm very confident we'll be able to get our files from ISO9660 discs, but I already have a bunch of WordStar and old MacWrite/MacPaint files I can't open and it's only been a decade. We'll be able to retrieve the raw data, but will be actually be able to interpret and make use of it?

    Well, there are two issues here. One is keeping a readable copy of the software, the other is being able to run it. Since most software programs are used by large numbers of people, it seems likely that someone would have the foresight to keep a copy of the software to interpret the data along with the data itself. Running it also shouldn't really be a problem for future generations. Presumably, someone will have a copy of the specs for the architecture for which the software was written, and an emulator can be created. Of course, if the software's source is available, it would be even easier.

    Also, reverse engineering a data format isn't that hard anyway. If you looked at the raw data of your MacWrite files, I'm sure you'd find your text in ASCII somewhere, possibly with embedded formatting information. Non-textual data is more difficult, but still possible, particular if you have some fragments of information about the data format to go on.

  • Back in the old days, when all we had was wood-burning computers, one form of memory was the delay tube. Bits were pumped into one end and they took a finite time to transit the tube. They'd be fetched out the other end, amplified and cleaned up and fed back into the front end, again. Data would be read/modified as it went by.

    Perhaps we can still use the same technique to solve the data archiving problem: Just broadcast all our data into space. To read it, all we need to do is invent FTL drive, pop out to the right point in time and read the data as it goes by.

    I'm sure we could find other uses for the FTL to help recover the R&D investment.

  • That would be Brig. Gen Nguyen Ngoc Loan [] who just died back in '98.

    And while we're on Vietnam, where would Rage Against The Machine be without the monk setting himself on fire?

  • by rjamestaylor (117847) <> on Tuesday April 10, 2001 @12:01PM (#301166) Homepage Journal
    Does anyone care...What the days slashdot articles are from 50 years ago?

    The problem with planning for the future is that it is hard to know today what will be important tomorrow. Perhaps the insignificant trolls on Slashdot will be of great import in the future (and, no, I'm not referring mainly to Jon Katz articles). Who woulda thunk that an accounting ledger from ancient mesopotamia would be of any interest 2500 years later?

  • How should the dawn of the digital age be recording itself for history, especially casual correspondence that gives insight into day to day life?

    I thought that was what things like Echelon and Carnivore were for????

  • No, the trick is that a picture is worth 1000 words. Since graphics usually compress worse than text (limited dictionary)

    The latest wavelet compression techniques [] can compress a good-sized color image to 8 kilobytes, or the size of a thousand English words plus light markup.

  • On July 20, 1969, Neil Armstrong was the first man to walk on the surface of the moon. Here is a picture, in an open, documented graphics format

    And the format is called ASCII art []. Just use this simple program [] to convert your 1-bit .bmp format images to images made of standard ASCII characters.

  • You just don't want to accept random binary data that you would have to retain a reader for as well.

    If binary is the problem, uuencode is the solution.

    If proprietary formats are the problem, then documented, unencumbered formats such as PNG, JPEG, FLAC, and Ogg Vorbis are the solution. Just make sure to archive documents (such as ISO and IEEE standards) that can be used to create a reader.

  • The bigger question isn't media, but sof[t]ware. I already have a bunch of WordStar and old MacWrite/MacPaint files I can't open... will [we] actually be able to interpret and make use of it?

    For older formats, you can always emulate the computer for which the viewer software was designed, or write a new viewer from the format documentation. For example, QuickTime 4 can open MacPaint files, and so can a short C program I wrote. Remember, if you want to archive something, make sure you have the format documentation (or the viewer software and the architecture documentation) so that future generations will be able to create a usable viewer. (IEEE and ISO standards are Good Things[0].)

    About five years ago I still had an old floppy controller with an odd WD chip on it that could talk to it using OS-9.

    So install Mac OS X (the successor to Mac OS 9) on your machine and read that floppy.

    Oh, you were talking about that OS 9.

    [0] GOOD THING is U.S. Trademark No. 75,516,347 registered to Martha Stewart Living Omnimedia LLC. (Look it up at TESS [].)

  • LTSS 1.0 could support WAV, MP3

    s/MP3/Ogg Vorbis/ []


    s/GIF/PNG/ [] because PNG is better documented and supports 24-bit color and alpha transparency. You partially address this with


    but s/TIFF/PNG/ because even without TIFF's LZW codec, TIFF is much larger than PNG and not as well standardized.


    Non-European language advocates would complain.


    Better. Thank you. This solves the script issue, but in what natural language would information be stored? How is it a valid assumption that future generations can read format specs written in US English of A.D. 2001 or in UK English of A.D. 2001?

    HTML version whatever

    Make sure it's run through W3C's HTML Validator [] if you want to archive it. MSHTML is a Bad Thing.

    and perhaps even Java for interpretation of abirtrary [sic] file formats.

    The Java(TM) langauge does not have the wealth of alternative implementations that the C language has. Both are nearly Turing complete (full Turing completeness requires unbounded storage) and equally fast when compiled to a native instruction set [].

  • The dead mail queue on my mail server is huge. If all the sysadmin in the world were to just never clear their dead mail queue, we'd have a pretty accurate archive of the state of the Net.

    2315 AD: It would appear that the entire society was obsessed with "NAKED HORNY CHEARLEEDERS WET AND WAITING FOR U!!!!!!!!!!", "online casinos", messages from some person named "bounce@" and worshipped a diety called "Viagra". No wonder they vaporized themselves.

  • by rgmoore (133276) <> on Tuesday April 10, 2001 @11:01AM (#301177) Homepage

    Of course the flip side of this is that it's not always possible to tell who will be considered interesting in the future. In many cases, the most interesting use of archives is to look at the work of interesting people while they were working their way up and weren't of broad enough interest to attract major attention. Nobody knew that a 25 year old patent examiner named Albert Einstein was about to become a scientific star, but because we have his personal letters we can find out what he was doing scientifically and personally.

    You never know if the next great author might be posting his early, great works to some fan e-mail list because he can't get his foot in the door at a major publisher. Maybe the next great debator is getting started in flamewars on Slashdot. Maybe the next great OS designer is getting into arguments with established academics on USENET. Oh, wait, that already happened, and we can only read the argument [] because somebody though to archive it. Maybe the next great philosopher who will be mostly ignored for 100 years is already publishing his early thoughts somewhere on the web. You can't always tell what will be valuable to the future until well after the fact, so preserving as much as possible is still a really good idea.

    A truly wonderful example of this kind of thing are the early works of JRR Tolkein. The early history of the Silmarillion is absolutely fascinating and a wonderful example of the development of a literary theme. That's a work that wasn't published for over 50 years after it was started, but some of the earliest drafts still exist. Because those drafts are available, it's possible to see how it developed. Will the same thing happen when authors write everything in Word and write over old versions every time they change anything? How about if they're still very careful about keeping copies of early drafts but the formats change so much that they can't be read anymore?

  • This stuff has always been volitile. We have a fraction of the historical data we would like to have from any time period. Yes, the letters of the Medici are still around and available, similarly the corresponsdence of the major players of our time will be archived (either electronically or in hard copy. Probably both.) The letters of the common man were as often discarded in times past as e-mail is today. Some of it will not doubt still be around (just as the data on many of those eight inch disks still survives on more modern media today), but the vast majority will be lost. This is fine, especially since there is a finite amount of data that historians can analyse anyway. Generally speaking it is nearly impossible to tell what will or will not be historically sigifigant from the point of event origins anyway. I would venture to say that considering the level of literacy in our culture today, and the varied data storage mediums available, historians will have far more data from our time than current historians have from anytime before World War II.

  • With Raptor, the NSA, and other intelligence gathering organizations.

    The trick will be recalling the data from those organizations.

  • Broadcast everything important into space. If we ever need it again, we just zip out along the transmission wave at realitivistic speads, until we get to the bytes we want, slow down and read them, then zip back home. M@
  • I wouldn't worry about it ...

    Right now, the NSA is reading and cataloging all of our private e-mails -- there will be records of everything we say for generations to come!

    "Grandpa, what was a EULA?"


  • by fm6 (162816)
    This gets worse as time goes on. It's already hard to explain feudalism to people, try explaining the Roman Republic's governmental structure. Now, try explaining American Democracy in 500 years.

    I strongly agree with AC's argument. But forms of government are a really bad example. How many Americans have any understanding of how their government works? Even those who have taken the time to study it (mostly naturalized citizens, who are required to know something about this stuff, unlike "real" Americans), mostly just read the Constitution and related documents -- which have roughly the same relation to actual government as physical chemistry has to cooking.

    Feudalism is an even worse example. The word, in its modern sense, was first used by French revolutionaries, to describe the aristocratic regime they had just overthrown. (Before that, it was a legal term, applying to a certain kind of property law.) Since then there have been endless redefinitions of the term, all of them pretty conflicting.

    A better example would be based on simple cultural icons. In 500 years, how many people will know that Neal Armstrong was a real person and Luke Skywalker wasn't?


  • Yes, a lot more information will be lost in the digtal age than in previous times. But an awful lot more will be preserved, too.

    The archeological record always seems to improve with technology. From stone etchings to written scrolls to printed matter and photography and on into the 20th century, the more technology people had, the better record they left of themselves.
  • Not to mention the Illuminati!

    Every wire inside your house has the potential of recording everything you do and sending it to an illuminati communication location to be stored. Hell, they've known everything well before the United States was formed.

  • For us Americans, anyway, our National Archives and Records Administration [] seems to be quite aware of the issues involved in storing digital data for future retrieval. They may even have some good clue factor going (a bit amazed, myself):

    To do so, they are using a new computer language called eXtensible Markup Language, or XML. It is a way of marking up electronic documents with easily understood tags instead of coding dependent on what will some day be obsolete software.

    Naturally, NARA's main focus is the archiving of documents that are mainly of historical significance to Americans.

  • Just because a large bit of info on our culture may be lost doesn't mean it will all be lost. Sure, a lot of relevant stuff is stored digitally, but a ton of information of every kind is available on paper. If future historians want to know about our culture, let them dig up our old books and newpapers and magazines, they'll learn incredible ammounts about us. And if they want to know about our digital culture, they can still hit the books. It's all on paper, somewhere.
  • by Erasmus Darwin (183180) on Tuesday April 10, 2001 @11:05AM (#301209)
    The National Archives only accepts data in ASCII format. They view text as the lowest common denominator [...] You can understand their posistion after you sit down and think..this is our American history...

    So I'm sitting down and thinking, but I still don't understand their position. I can appreciate both the importance of ASCII text and its accessibility (hell, I still use lynx to browse the web), but I can't understand why you would restrict yourself to only text.

    Consider the following:

    On July 20, 1969, Neil Armstrong was the first man to walk on the surface of the moon.


    On July 20, 1969, Neil Armstrong was the first man to walk on the surface of the moon. Here is a picture, in an open, documented graphics format.

    There's just too much history that's more than just pure text. I can understand trying to make as much material as possible available as text, but you can't let such a decision allow you to exclude relevant materials that're more than just text.

  • I work at a university library as a 'technical specialist' (gloified technician), and recently sat in on a meeting involving how libraries are(and should be) archiving data via digital media. The long-run case is simply this: digital media saves space, but keeping up with a good five year plan keeps the data available, yet is expensive.

    Basically, the five year plan means rotating the data from one media type to a new media type... waiting every five years. Although computers move from day to day, the method of data storage and retrieval remain approx. the same within a five year period. As long as the data is updated every five years or so, the data are always available. The price of keeping the data in this state of never-ending movement would be somewhat static, as once a new method of storage comes of age, and is a standard, it is pretty cheap. The real price comes from manpower. Which... could be solved by spending some time developing a software system that could be altered, on command, to handle the new media... Enter Linux!

    I could keep going on and on about this beautiful system, but I grow weary of trying to remember all of this stuff, and typing it, and looking like I am still working on something useful! ;)
  • Well lots of the communications of the Medici was lost: some of the really important stuff was never put on paper - is there a record of every spoken word?

    Besides, data formats are nothing, historians have decoded long forgotten scripts and languages which no-one speaks anymore. I think it will be comparatively easy to get at the files on a CDROM, 500 years from now.

    They'll just put the thing into some sort of a 3-D scanner and work on the computer copy... "Oh lock these dents are 1's, and those are 0's and they write them in a spiral." Sure it may be tough work (file system, data formats), but they'll also have very sophisticated technology to analyze these things. They might just have to click on the "unknown media wizzard" and get all the files. ;)

    Another problem is deteriorating media, but on a historic level I don't think it matters much. Current data recovery companies can do amazing things already: restoring hard drives from totally burned-out PCs, or restoring data which has been overwritten multiple times.

    It's one problem to keep your data so you can readily access it in 50 years, but I think on the scale interesting for historians we have no problem at all.

  • The economy may be built on computers, but rest assured social record-keeping is not. For important documents, and permanent information, paper copies are still much preferred over their electronic cousins.

    Think about the last time you read a novel on your laptop, instead of picking up the book. And the last contract you signed? It wasn't on those digital pads you find at Best Buy for signing receipts. Paper is still king, and it will be for years. It never gets obsolete, and it lasts just as long as anything else we have.

  • by skoda (211470) on Tuesday April 10, 2001 @10:25AM (#301226) Homepage
    I've been reading Stephen Ambrose's [] books the past couple of years, and based on his work, I now think that the 'whassup' emails are of value, because they will tell historians about the common man.

    While the histories, news articles, and official documents of a given era are very important and informative, it is also necessary to the personal accounts from the people involved in the society at the time to help provide perspective, and to help identify biases in the 'official' accounts.

    Considering how valuable even the pedestrian of documents are from e.g. 3000 BC, I imagine that today's equivalent will be of equal value to historians in the 7000 AD.
  • by aussersterne (212916) on Tuesday April 10, 2001 @10:50AM (#301228) Homepage
    There's a large difference between 8" floppies and CD-ROM. The installed base of CD reading mechanisms (CD-ROM, CD-R, CD-RW, PlayStation, Dreamcast, SegaCD, Saturn, PS2, 3DO, VCD, home stereos, walkmans) is many orders of magnitude greater than the installed base of 8" floppy drives ever was.

    Even two or three hundred years from now, a reasonably skilled technician or at worst a team of them will be able to dig up a CD mechanism from somewhere, fix it up and get it reading data. CD mechnisms are like Ford's Model T -- only much more common -- and let's face it, there are still a reasonable number of Model T's running around to auto shows, and there isn't nearly the historical incentive to keep a Model T running that there is to ensure that there will always be a CD-ROM reader running somewhere.

    And it's likely that if most people are like I am (I value my data and my work) they will continue to migrate data to new formats as they emerge.

    The bigger question isn't media, but sofware. I'm very confident we'll be able to get our files from ISO9660 discs, but I already have a bunch of WordStar and old MacWrite/MacPaint files I can't open and it's only been a decade. We'll be able to retrieve the raw data, but will be actually be able to interpret and make use of it?

    P.S. I still have an old Siemens 8" floppy drive, single-sided, hard sector. About five years ago I still had an old floppy controller with an odd WD chip on it that could talk to it using OS-9. No way to talk to it with my Linux box, though...

  • Thanks to lawyers most of the organizations I've been a part of have a policy of shredding all paperwork after 7 years. Not a lot of history there.

  • If you use a film camera and throw the negatives and prints in a shoe box they will last almost forever

    This is true of the hundred-year-old Bradyesque B&W's you mention, but the chemistry of color snapshots taken over the last fifty years makes them substantially less stable -- something to do with the organic dyes they use. Ever wonder why that old Kodachrome snappy of Grandpa from 1965 has that awful pink tinge? It'll only get worse, until eventually it's an unrecognizeable blob.

    However, the older B&W stuff will just get a little yellowish. Or "sepia" if you prefer.
  • If I recall correctly, many attorneys are now advising clients to proactively delete archived email and other correspondence stored electronically, so that in presumed future legal actions the discovery process won't turn up incriminating evidence in the defendant's files.

    The deletion, apparently, if prescheduled on all documents doesn't consititute obstruction of justice, whereas conscious destruction of only selected material may be construed as obstruction.

    Part of the problem in maintaining a useful archive into the future is storage media, but a bigger part is the attitude that we should be afraid to allow our routine communications to be stored permanently.

    Oh. And by the way, IANAL.

  • by cube farmer (240151) on Tuesday April 10, 2001 @10:26AM (#301243) Homepage

    The analogy would be to read emails from, say, the white house in 200 years. Do you think the white house is saving their emails? You bet.

    Apparently, George W. was an inveterate user of email right up until the inauguration. At that point, he sent a farewell missive to his correspondents [], in effect saying he could no longer use email because all such correspondence would be a public record and he didn't want his private musings made public.

    So, no, many important communications will not be retained, unless someone is placing a wiretap on the president's phone.

  • Guess I'll do what I did when I pulled the 5 1/4" drive: Grab all the old 5 1/4" diskettes and move the data to the hard drive, burn a CD and away we go. So, right before you get rid of your CD drive forever, pull all your CD's and copy the data to whatever the new great media-of-the-year is.

    Oh, and avoid the "no-copy" media.
  • We already have formats like this. SGML, CGM - they are called ISO formats. "Open" is another good word for them.

    I used to work for a company that built large plants that were expected to last 30+ years. We were greatly worried about how we were going to ensure that the original plans, operating procedures, maintenance records and the like were not just readable but usable and updatable down the road.

    Basically the answer came back as - use only published, official, open standards. Even though they are not the greatest at least you can always reengineer the reader and the software because the designs are formally published (and not just as source code).


  • Tell my mom. She's good at remembering useless details that nobody cares about and explaining them to anyone who listens. Plus she was born before the advent of the telephone.
  • Actually, paintings do deteriorate due to viewing, and quite quickly. Photons bombarding the pigment cause the colours to fade like an old photograph. There are regulations as to how bright lights in a gallery can be and how many there are, as well as how many days out of a year a painting is viewable (the rest of the time it's in a dark climate controlled room). And remember, the Giocanda is only 400 years from earlier times have only survived due to extreme storage facilities. The cave paintings around Cro Magnon, for example, survived because they've been in a cold fucking cave for ten thousand years. And the artifacts of Tutankhamun and Rameses II survived because they were buried in a stone coffin in one of the dryest areas in the world.

    The digital age gives us great hope for preservation of everything, because we can copy sounds, images, motion and even DNA structures with perfect reproduction. But it will only be through the careful preservation of this information that future generations will be able to access it

    If anything, and you can consider this a dig at DMCA if you like, it will be the number of copies of these artworks that will permit them to be preserved. Consider this: there is only one Mona Lisa -- if she fades, we can only guess at what her colour was. But there are millions of copies of Wing Commander IV. It's a relatively simple task to go through a few thousand of these, extract from each disc what data hasn't rot through, and compare it to the others. Combine that with huffman coding and CRCs and we can quickly reconstruct the original with perfection and certainty. You can't say that of the Venus DeMilo. And unlike other generations' copied mediums, we can trust the intermediary -- the cold, heartless eye of the scanner and OCR soft -- not to misspell anything or make up shit. Bemoan the need for proprietary copyrights if you like, but the digital age's perfect reproducability is the factor that will decide its permanent etching in the databases of the future.
  • In this very litigation-prone society, businesses are being advised to purge email as soon as feasible and to not keep many generations of backup. One of the first things a law firm does in a major suit is "discovery," where they demand copies of all your relevant records including email.

    The effect of this upon history is obvious. Originally, historians thought that the digital age would be great for doing history, because so much source material would be available. The growth in data warehouses and similar archives indicates that it's human nature not to throw anything away. (That, and my garage!) But now, to prevent the risk of exposure during email discovery, there won't be anything left.

    What a shame.

  • High quality paper only lasts 500 - 1000 years under the best storage conditions. (Cheap paper contains residual acid which destroys the paper in a few decades even under ideal conditions.) What we have of Roman and Greek literature, or of the Bible, is copies of copies, and sometimes not too accurate. Some messages engraved on stone have lasted over 5000 years -- but it's expensive and low capacity, and much has been destroyed by weathering, religious fanatics, and other vandals. Engrave it on gold and bury it and it will last forever -- unless it's dug up by barbarians that just melt it down...

    The stamped CD's will probably outlast paper records, but are only good for large-volume publications, not for the actual records that most interest historians. CD-R/RW and similar dye-based disks, properly stored, are probably going to outlive the technology to read them, but they are less stable than good paper.
  • Of course, they'll be uncataloged and locked away in a gov't warehouse along with the Ark of the Covenant, the real investigation into the Kennedy assassination, and the records of which soldiers were deliberately exposed to atom bomb tests. 8-)

  • Digital records are favored by our corrupt, foreign-dominated Federal tyranny for one very simple reason:

    It's terrifyingly easy to alter them, or to dispose of them entirely.

    This is frightening, but true: As the well-known conservative George Orwell observed in his great novel 1984, "He who controls the past controls the future." The "Party" in 1984 devoted itself to doing exactly what the Clinton regime did: They went through all historical records, altering, falsifying, modifying, deleting.

    No one will ever know what the Clinton death count really was. No one will ever know what really happened. The "records" are malleable. You can trust no information that comes from the government, because it's all been "massaged" and "fixed up".

    Will there be historical records? Not in any meaningful sense: There will be something that looks a lot like such material, but it will be a work of pure fiction.

    Goodbye, America. We were great while we lasted.

