Long-Term Personal Data Storage? 669
BeanBagKing writes "Yesterday I set out in search of a way to store my documents, videos, and pictures for a long time without worrying about them. This is stuff that I may not care about for years, I don't care where it is, or if it's immediately available, so long as when I do decide to get it, it's there. What did I come up with? Nothing. Hard Drives can fail or degrade. CD's and DVD's I've read have the same problem over long periods of time. I'd rather not pay yearly rent on a server or backup/storage solution. I could start my own server, but that goes back to the issue of hard drives failing, not to mention cost. Tape backups aren't common for personal backups, making far-future retrieval possibly difficult, not to mention the low storage capacity of tape drives. I've thought about buying a bunch of 4GB thumb drives; I've had some of those for years and even sent a few through washers and driers and had the data survive. Do you have any suggestions? My requirements are simple: It must be stable, lasting for decades if possible, and must be as inexpensive as possible. I'm not looking to start my own national archive; I have less than 500GBs and only save things important to me."
Hard drives kept online (Score:2, Insightful)
Not enough history (Score:5, Insightful)
We don't have enough history on this tech to know what, if anything, will "last for decades". Possibly "paper" and "microfiche" might fit in that list, but those aren't the sort of things you're talking about. Best option I can think of right now would be to get a couple 500gig drives, put everything on both, and then put them in different areas. In 3-5 years, back them up to something newer, and repeat that every 3-5 years. Maybe in those intervening years, we'll have more data and newer tech that's demonstrably suited for what your needs are.
Not being answered (Score:5, Insightful)
Obviously this question hasn't been answered for the general public because this is like the 4th year in a row that this question has been asked on Slashdot.
Not the media that's the problem (Score:5, Insightful)
An archive is not a long-term backup (Score:5, Insightful)
There is absolutely nothing that you can put away for decades and expect to be useful. Your requirements are not simple - they'll actually very, very hard to meet, even if you want to throw a lot of money at the problem.
You don't know that a jpeg, for example, will be readable in 30 years. The format may be so deprecated that there might not even be a viewer available. Like my old Microsoft Works 4.0 documents - although I have the data, I have nothing that can read them unless I want to spin up an old Windows image, assuming that I can generate a virtualized environment that can support an old Windows (Windows XP probably won't even boot on any PC being produced 30 years from now). And some of that data is only a few years old, not decades old.
You should store not only the data, but also the applications that created the data. And the computer you need to run those applications. And backups of those. And then every few years, pull it all back and validate it and update as required.
You may have only 500GB now, but 10 years from now that will be 5TB. And then you need a way to actually be able to find something you added to your "archive".
I deal with this at work regularly. An archive is not a backup that you keep for a long time. It's much, much more than that. Once you start thinking about all of the issues that come up, you'll see that the media is the least of your problems.
It doesn't exist. (Score:5, Insightful)
You are the analogy of an investor who wants a high-yield, low-risk, completely liquid instrument. The term is TANSTAAFL.
I maintain two (yes, two) USB external drives. Every couple of years, I migrate to a larger, or otherwise better medium. I use an incremental backup system (for me, cpio) that ends up keeping too much stuff, but at least I have the stuff I want if I need to get to it.
In a decade - in my case, four decades - one can accumulate a remarkable amount of crap, along with things one truly wants to save. I have a total of about 90 gig of actual data, plus a far larger amount of music and video, which I consider more or less disposable. It is not difficult, nor expensive, to purchase another external drive and copy the data. My oldest backup is on IBM 2314 disk pack, but the data still held on that disk is also present on my current backup, a WD 160G in a USB-1 enclosure. Sometime next year, I'll go to a 500 G drive in a USB-2 enclosure.
An important consideration is to periodically check to see that the data ostensibly held on a drive (or CD, or DVD) is actually readable. DVD/RW in particular has a tendency to get flakey over long periods of time, expecially if stored under adverse conditions (jammed in back of desk drawer, under sixteen pair of scissors, stapler, a box of pop-tarts, and four old coffee cups. I always keep my last few generations of backups, and if I find an unreadable datum, I make an effort to recover it from the previous backup.
While it may be stating the obvious, it's a Bad Idea (TM) to wait to back up data until you have a problem. I back up all of my data every week or two, and critical data, daily, without fail. Critical data is cached as a three-generation dataset (IBMese).
Good luck. There are no real solutions, just ways to cope.
yes it works-for equipment failure (Score:3, Insightful)
right up until you have an enviroment diasaster.
(enviroment can be as small as a tiny fire in the power supply of that PC)
Theft of the PC? are you covered?
FFS it is IN THE SAME CASE!
Re:Quality DVDs, archival storage, repeated backup (Score:3, Insightful)
Changing formats (Score:4, Insightful)
[From Slashwayback]: Dear Keypunch, I have data I want to keep for decades. Should I invest in a good card reader, or should I transfer my data to these far more efficient but newfangled "floppy disks"?
It's pretty ridiculous to expect one storage format to be viable for 'decades'. Not because it goes bad (even though it probably does), but because you're not likely to be able to maintain the necessary equipment for that long. If you find a storage solution, you need a retrieval solution to equal it. What equipment will you be able to find decades on that can access your storage, even if it stays good? You have no idea.
I've been maintaining a collection of Apple IIs and recopying the programs and data regularly (mostly through full HD backup, reformat with error block deletion, reformatting and replacing) to keep it readable. I have machines and data between 20 and 30 years old. I recognized long ago this had become a hobby in its own right, as most of what I had hasn't been of interest to me for many years. The little bits that have been useful have been transferred to newer machines and formats several times. That's decreased as more and more of it can be found easily on the web (previously FTP/gopher/etc.).
Get used to transferring your data to new formats as they come into widespread use, and recopying as necessary to keep them readable. Or else:
[From Slashwayforward] Dear Galactic EM Field Computing, I just found about 20 pounds of aluminized plastic disks that used to have data on them, but I can't read them to tell if I still want it. Is there any museum that might want these? Or are there still any operating plastic recycling centers that might give me a few bucks for them?
Think Different (Score:4, Insightful)
I think the issue is that people are thinking about this incorrectly. You don't really want to 'archive' this data -- keep it with you! Keep it with all of the data that you are using day to day and back it up and move it along with that.
My home workstation still has files from 15 years ago on it. I've replaced the computer many times, had a few hard drives fail, etc. but I've always restored both current and 'archive' data from backups and kept going.
Re:Not being answered (Score:5, Insightful)
That's because there is no fully satisfactory answer. We'd all like a just do this, throw it in the corner and when you come back for it in 50 years it'll all be there sort of solution, but there is no such beast within the realm of affordability.
It's a problem with several aspects to it as well. Let's say there is a SATA drive out there that absolutely CAN sit in a safe deposit box for 50 years and then work perfectly every time. In 50 years, all computers will have whatever the successor to whatever replaces SaS and when you mention SATA, the old timers will all get nostalgic and go on about tying onions to their belts (which was the fashion at the time). You'll then have to take the decidedly NOT affordable step of having someone build you a one-off SATA controller that can interface with a computer of that time. That is, if you can get the old-timers to stop reminiscing about the Vista debacle of aught eight long enough to recall the specifications of SATA. Be sure to duck, some of them might throw a chair for ilustration.
Re:An archive is not a long-term backup (Score:5, Insightful)
I think we do know that a JPEG will be readable in 30 years. Formats that have been around for like 10-20 years like JPEG are going to be here for a long while longer; I'd say until the end of civilization at a minimum (and even then, it wouldn't be hard for people to figure out the format). The worse case is that in future generations only a librarian or data archaeologist would have the tool to open it. Given the open source nature of JPEG, more likely you'll just download a JPEG viewer.
MS Work 4.0 documents is completely different. There was always only one implementation, it wasn't open source, it wasn't a documented standard, and the life span of the format was small to tiny.
Re:Not the media that's the problem (Score:2, Insightful)
Re:Amazon S3 (Score:4, Insightful)
The OP specified "decades if possible." So we have two problems here. One is that 30 years at $75/mo comes out to $27,000, which is a little pricey. For that amount of money he could probably hire someone to come to his house once a year, verify the readability of his media for him, and transfer them to new media as the old ones become obsolete. $1,000 for a few hours' work? I'd take it.
The other problem is that the probability that Amazon S3 will exist in 30 years is very low. This is basically the problem with any possible answer to his question. There isn't any computer-related service or equipment that you can be sure will still be there in 30 years. A more realistic goal would be to do it in 10-year steps; if that's all he wants, then the shoebox full of flash drives should work fine, and then 10 years from now he can transfer those data to something else.
I initially misread that as 500 Mb, which is about the amount of critical data I have that needs backing up. 500 gigabytes is kind of a crazy amount of data. One way to get that much might be that he has a gigantic collection of mp3s, or possibly a moderately huge collection of music in a less lossy format. But then that's not critical personal data, it's just a music collection. And the chances are that as the decades go by, he'll realize that the music he thought was so important and wonderful in 2008 no longer seems so important to him. I know plenty of people who still have their Kool and the Gang LP's from the 1970's, but it's not like they're willing to spend a thousand dollars a year to obsessively maintain them.
Re:Not enough history (Score:5, Insightful)
One drive should be "live" and the other archived. Considering we all own computers, throwing a 1tb drive into a box isnt so difficult. Hell, you could write a script to power it up once a month and then power it down, if people are worried about energy costs but dont want to keep it spinning 24/7. It doesnt need to be ever mounted.
Better yet both disks should be running in a RAID 1 array. This is a cheap solution, but its not a "toss in the closet and forget" solution. If this guy actually cares about his data I dont see why he cant spend 200 dollars or so for two drives and a raid 1 card.
I see this question at slashdot every couple of months. The answers are still the same. Keep it live on a disk until a better solution is found. Upgrade the disk every so often. That's it. Mods, stop posting the same damn question every month.
Re:Sorry, it's insoluble. (Score:3, Insightful)
You raise some good points but perhaps protest too much.
Of course, it depends on exactly WHAT form your data is in. Text files will likely be readable until Kingdom Come. Microsoft Works files seemingly get deprecated every version. It is quite likely that common graphics files (JPEG especially, TIFF probably) will be readable for quite some time. JPEG especially isn't going anywhere. Both formats are well described and if the OP had any data in these formats that was of interest to a digital archaeologist in 2300 they could probably recreate the format. But that's not what most people want to do with their data. If you want that kind of permanence, create a cult and make giant statues out of rock....
Copying data from one format to another shouldn't be a Sisyphusian task. Assuming you don't put them on CDs or DVDs and have a metric shitload of them. The trick is to switch to the new media BEFORE the old one is "extinct". I shifted plenty of data from CDs/floppies to hard drives, I've upgraded my hard drive arrays several times. I presume that when holographic storage finally gets real, I will be able to just as the OS to copy the data to the new format.
Keeping the old programs that read the data is a good idea and essentially a freebie. Since most of my photography is stored in a proprietary format (Nikon NEFs), I have a copy of David Coffins dcraw in various places (A neat, open source program that reads pretty much any RAW camera format out there and the basis for a number of commercial programs.
.txt format. And unfortunately, I don't think that PDFs are going anywhere (may Abode burn in whatever Hell is reserved for Evil Corporations). Convert the important files to a couple of different formats. Cheap, fast. Let your grandkids sort if out. Which they likely will do by hitting whatever is the current equivalent of the "Delete" key.
My data based stuff that I want to keep (taxes, personal files and such) have at least one copy in
Re:An archive is not a long-term backup (Score:5, Insightful)
I think you're exaggerating the problem a bit. Formats like GIF, JPEG, and ODF will most likely be readable somehow in 30 years. They may not be the format of choice, but we have open source readers for those things, so for as long as lots of people have data in those formats, someone will be maintaining viewers that allow reading them and probably converting them to newer formats. Besides, it's not clear to me that we're going to come up with much better compression methods for static images, or that we really need to bother coming up with much better compression methods for static images, which means it isn't that unlikely we'll still be using JPEG in 30 years. I'm not saying it's a lock or anything, but it's not *that* unlikely.
Now, with a format like ODF, if adoption isn't bigger before something new comes along, you might have a hard time reading that just because of the relative obscurity of the format (which is a problem JPEG doesn't have). In that case, it will probably depend entirely whether enough people have enough valuable information in ODF that some developers somewhere think it's worth writing a viewer.
Yes, ideally emulation would be available for every obsolete platform, and we'd all keep VM images of all our old operating systems. We'd all keep all of our old applications to install on those images, and VM software would always be backwards-compatible meaning that we'd never lose anything. I'd love to know that someone somewhere is working on that, if only for historical preservation. However, for the individual who might have limited resources, it probably won't be necessary. If it ever becomes necessary for that to happen for most people, someone will be able to make a lot of money selling a solution.
In most cases I'd say the best bet is to stick to open formats, keep copies on multiple different media, and continually migrate to new media. So, for example, back everything up to a hard drive and create checksums for every file, and then burn multiple copies to DVD. In 3 years, pull them all out, check all the checksums for corruption, and copy known-good copies (and checksums)to your brand new 5TB hard drive, and burn a couple BluRay discs. In another 5 years, check the checksums again, get known-good copies, and copy them to your 50 TB SSD and burn a couple copies into your super-ultra-cool whatchamakallit.
Re:Hard drives kept online (Score:1, Insightful)
Suggest a better approach...keep your data modern" (Score:2, Insightful)
There really is a simple way around this - and it is what I've done - I've got data 25 years old and it's still relatively easily manipulated with a little work. I've found floppy disks are relatively resilient, and old hard drives seem to keep their data for a long time. I've got a computer, display, keyboard, and associated peripherals stored for every generation of data that I kept:
1.I have a Commodore 64 with floppy drive and cassette drive stored in a box with the floppy disks and cassettes from that generation (late 70s/early 80s).
2.I have an IBM PC/XT with keyboard, a 5 1/4" floppy, 3 1/2" floppy, internal 20MB hard drive, and CGA monitor stored in a box with a load of 5 1/4" floppies filled with data from that generation (Mid 80s).
3.I have an IBM RS/6000 with display, keyboard, and mouse and internal 500MB hard drive loaded with all my docs and projects from that generation (early 90s).
4.I have a Pentium 2/300 PC * 15" monitor with windows 98, CD R/W drive, 3 1/2" floppy drive, and USB ports - and a crapload of CD's and 3 1/2" floppies full of stuff from that generation (Mid/late 90s).
When the current generation looks like it's going to be moving on, I'll put away a Core 2 Duo system with 1 TB of hard drive full of stuff with the different OS's I used loaded on it with boot manager (Ubuntu, XP, FreeBSD), a crapload of USB keys full of documents, along with burned DVDs etc. That'll take care of the "'00" generation.
The answer lies in not only archiving your data "of the generation" but the essential equipment needed to access it. I may have a heck of a time moving data off of my Commodore 64 - but I can at least see it and access it - I believe I stored a modem with it - so at worse I could set up a terminal server that it could dial into and dump data to. All the other systems I'm pretty sure I could recover stuff from - even if the PC/XT does have an MFM hard drive, etc.
I have data 18+ years old . You're approach is admirable, but why not just move your data forward with technology?
When floppies started dieing, hard drives got large enough so I moved all data off the floppies to hard drives and optical media. When word processing software I used started dieing, I moved all my documents or obtained converters to MS Word format. Also, I don't archive music and movies. I do archive pictures etc.
I believe the best approach is just keep your data moving forward & current and not in some archaic format. This means I have 3 redundant copies of all my data on hard drives using a current OS.
Granted I don't archive "silly things" like music which I can re-create; but rather just personal data (i.e. personal documents, pictures, personal videos) so the total quantity of data after 18+ years is only about 13gb.
Re:Magnetic Tapes... (Score:5, Insightful)
I think your post is very insightful, and I have an additional problem to throw into the mix: sorting through all the crap you've archived, even assuming you can read it all.
I don't know about you, but I've run lots of different backups on lots of different systems, and one of the problems that always comes up is just finding the revision of the file you want. People say, "I want the copy before I made this revision-- I think I did that about a month ago." Check the backups and there are no revisions from a month ago, but there are 20 from the month before. Next thing you know you're checking 20 copies by hand, and none of them are what you're looking for-- and that's even when your backup/archive system is working.
So when devising any kind of archive, I think it's at least worth considering, "How am I going to find what I'm looking for in 20 years?" Imagine yourself in 20 years, and you have every piece of data you've ever generated stored on some kind of media that holds hundreds of terabytes of data. You want to find some spreadsheet you made today (20 years ago). Maybe you don't remember exactly when you made the document-- you think about 15 years ago, but it's actually 20. You can't really remember what the filename was. You can't remember if you made it in Excel or OpenOffice, so you're not even sure what filetype you're looking for. What's going to be your method for finding that file?
I'm not suggesting it's an insoluble problem. It might be that it's not even a problem in 20 years because indexing/searching has become so good that your AI will be able to sort through terabytes in a couple seconds and make some good guesses about what you're looking for, but do you really want to rely on that happening?
Re:Not the media that's the problem (Score:2, Insightful)
Re:Not Amazon S3 (Score:3, Insightful)
If your data is not in your possession, how do you know others won't see it or edit it without your permission?
Encryption ?
several HDs (Score:2, Insightful)
For 500 gigs.
A couple of hard disks, stored in different locations (cities, not drawers), that you update+check once a month or a quarter.
Burn DVDS of the really important stuff (pictures, documents) around once a month, and mail them to your parents/family.
What NOT to do:
- RAID is NOT a backup solution, it is a high-availability solution. Of all the problems bakcups need to adress (theft, destruction, viruses...), it solves very few.
- don't keep your backups online and/or in the same spot: viruses, power surges, fires, theft... will destroy them
- don't have only ONE backup: Murphy's law, if your live data disapears, the backup will turn bad also
- don't forget to check that your backups are still good
- don't delude yourself into thinking that any physical media in use today will still be easily readable more than 5 years from now. (except for the consumer type media: CDs, DVDs)
Nonsense... (Score:2, Insightful)
Just toss the rest! Really! Nobody cares for the reams of out-of-focus or incorrect compositions.
Keep the great shots (one in a hundred, if you're a good photographer) and delete te rest.
Re:Not the media that's the problem (Score:2, Insightful)
Re:Hard drives kept online (Score:3, Insightful)
You can still get USB to parallel port adaptors.
The same thing will happen with whatever replaces USB.
I really hope we do not move to wireless USB. It will just be an extra set of security holes and other problems.
Re:Not Amazon S3 (Score:4, Insightful)
Re:Amazon S3 (Score:3, Insightful)
> I would figure that it's video... specifically, probably family home movies.
Digital video has opened a HUGE new can of worms. We have problems even today viewing video created just yesterday (especially over the Web) because of all the myriad codec standards.
Imagine what it'll be like in 20 years-- anything other than NTSC, PAL, or SECAM will be effectively extinct.
What about SSD? (Score:3, Insightful)
If NAND flash SSD lifetimes are determined by write frequency, then wouldn't this be fantastic for archival storage? Just write the data once, then read it as many times as you like.
Re:Not enough history (Score:1, Insightful)
Well, thankfully they DID post it another month. Because I just reached critical mass with my storage needs and this very question has been in my mind the past 3 weeks. There is no such thing as a stupid question, and just because You have it sorted out doesn't mean everyone else has. That is the reason this forum exists.
Re:Not Amazon S3 (Score:4, Insightful)
If someone's willing to brute-force a password to change your data, they're willing to copy your DVD's ISO, edit it, and burn a new copy to replace it with.
this might sound daft but (Score:1, Insightful)
why not write an application to encode your data as coloured dots and print it on archival grade paper with archival grade ink? then use another application in the future to read back the scanned page and convert the data back (I guess scanners or something like will still be around in 2250), you should be able to get 4mb to a page with some trickery (2048x2048 pixels), thats 2 gig per ream, 100 reams = 200 gig, humanity has developed the art of storing paper information to a fine art, it should last far longer than a lifetime with care
Re:Amazon S3 (Score:5, Insightful)
A lot of people shoot home movies and then become obsessed with preserving the footage. That adds up fast.
It seems that deciding what isn't important is a hard part of backing up.
After a loved one dies, even the lowest quality outtake with a thumb covering half the shot can be a priceless memory.
Re:Not Amazon S3 (Score:3, Insightful)
It actually kind of does.
In particular, modern RSA, even considering future advances, would require fundamentally different technology (quantum computing) to make it feasible.
By "feasible" I mean that if quantum computing doesn't become practical, a 4096-bit RSA key will survive the heat-death of the universe.
Crypto is very, very rarely the weak point. It's almost always how that crypto is used.
And by the way, mods -- blanket statements like the parents' are easy to come up with, and easy to make sound intelligent, but there's no meat to them. It's kind of like the blanket statement of "The only secure computer is one that's not on the Internet." Easy to come up with, sounds reasonable, also entirely wrong.