Long-Term Storage of Moderately Large Datasets? 411
hawkeyeMI writes "I have a small scientific services company, and we end up generating fairly large datasets (2-3 TB) for each customer. We don't have to ship all of that, but we do need to keep some compressed archives. The best I can come up with right now is to buy some large hard drives, use software RAID in linux to make a RAID5 set out of them, and store them in a safe deposit box. I feel like there must be a better way for a small business, but despite some research into Blu-ray, I've not been able to find a good, cost-effective alternative. A tape library would be impractical at the present time. What do you recommend?"
Exactly what you're doing (Score:5, Informative)
I don't think you can beat a bunch of conventional hard disks in a RAID5 for both cost-per-TB and backup/restore performance, not to mention medium-term data integrity. Might be able to make hooking up the drives more convenient with an eSATA mult-bay enclosure, but those are kinda expensive. But I bet your backup box already has some sort of hot-swap on it already, like: http://www.amazon.com/Thermaltake-BlacX-eSATA-Docking-Station/dp/B001A4HAFS [amazon.com]
I assume you already compress your data, since scientific datasets tend to compress well. You might consider compressing to squashfs, since it will let you do transparent decompression later on so you can skip the restore step if you just need a handful of files.
Re: (Score:2)
Except for the Fact that if you have a fire or flood or some other disaster to hit your site there is a good chance you data may be gone, as well physically stolen from you.
Re:Exactly what you're doing (Score:5, Insightful)
That's why you hot-swap them. You treat them just like tapes. In fact, once you start doing that, you realize that RAID mirroring isn't helping you any (striping is another matter).
The best way to backup a big hard drive these days is with another big hard drive.
Re: (Score:2)
Re:Exactly what you're doing (Score:5, Insightful)
(Or btrfs on a Linux distro)
Are you honestly suggesting using an in-development filesystem for backup purposes?
Re:Exactly what you're doing (Score:5, Funny)
Re: (Score:3, Interesting)
Re:Exactly what you're doing (Score:5, Interesting)
I don't think it's a great solution. You're storing relatively fragile hard drives in a raid5 configuration in a lock box? It's not like you can tell if one of the drives goes bad and needs to be replaced when it's sitting in a box. You'd have to regularly pull the data sets out, fire them up and make sure everything is still functional.
I'd at least want to do 2 complete sets of mirrored drives.
Tape storage does store better.
Depending on how important the data is, I might do something like a local mirrored drive set in storage and an online copy at something like rsync.net - stay away from s3, it's not designed to protect data, despite what AWS fans may say.
Re: (Score:3, Informative)
LTO4 tapes are reasonably price.. but the DRIVES push $5k. For long term even the drives go obsolete too quickly, then become MORE expensive in 5 - 10 years when you really need them.
The best thing is probably what Google does, simply keep 3 "live" copies of the data. Then the data is always on current hardware. The data is on "production" hardware along with other stuff so it is properly monitored by the OS and database for integrity, and hardware is maintained with support. Drive arrays are cheap enough i
Re:Exactly what you're doing (Score:5, Informative)
Yeah, keeping those drives in a huge online storage array is probably better. Then they can mirror them across multiple sites.
Here's a compelling petabyte online RAID system for cheap:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ [backblaze.com]
Re: (Score:3, Insightful)
If disks in the safe deposit box are fast enough to access, running to the store to buy a generic power supply is fast enough recovery.
Re: (Score:2)
Re: (Score:2)
Re:Exactly what you're doing (Score:5, Informative)
Tape is really best for archiving, to this day. A single LTO drive won't break the bank for a small business, and it will be reliable.
3 Things to remember about tape backup:
Encrypt your backups. This is becoming available in the tape drive itself, but many backup applications will also do it for you in software. Limits embarassment if a tape goes missing.
Occasionally test restores. This is incredibly important - almost every unreadable tape in existance was unreadable when created. Any reasonable backup software will give you the ability to do this automatically (as part of the backup job). If practical, create a job that does a backup of everything, but verifies only some small volume. If you can read anything, chances are high that the whole tape is fine.
Get those tapes offsite. A safe deposit box works for a tiny company, but someone like Iron Mountain works better and is less hassle. Store a copy of your encryption key in the same facility (but don't transport the tape and key together).
Re: (Score:2)
Depending on how important the data is...
That's the key question the author needs to address. Is it important enough to throw a few thousand dollars per dataset into archiving? A few tens of thousands? The best suggestion seems to be multiple copies on multiple non RAID hard drives stored at different physical locations with periodic integrity checks and regularly scheduled drive replacements.
Re: (Score:2)
Tape storage does store better.
CITATION MISSING
Re: (Score:3, Interesting)
stay away from s3, it's not designed to protect data, despite what AWS fans may say.
Just curious... S3 stores all of your data at multiple, geographically separate data centers. How exactly does that not protect your data? What else would you want it to do in terms of protection? It even gives you md5 sums of your files if you want to verify them (check the ETag attribute of each object).
So, honest question: what do you think they're missing to make S3 really protect data?
Re:Exactly what you're doing (Score:4, Insightful)
if it was medical records i'd be storing 5 copies in 5 geographically distinct locations, each with their own backup for the backup. i'd be checking the MD5's each day on all the backups to ensure they can be accessed when i need them
I can about guarantee you that nobody stores medical records in this way. And realistically, why should they? 5 different locations is insane for just about any piece of data.
Geeks tend to go overboard when it comes to data paranoia and worry too much about technology, but then forget about all the human problems that go on. Most data loss doesn't occur from some geographic catastrophe where a super volcano destroys half a continent. More often someone changes some critical path of the backup scheme and the whole she-bang comes crashing down. Super-redundant geographic co-location can't save you from one idiot that didn't understand changing one critical name silently took down the backup scheme.
Re: (Score:3, Informative)
The other thing to do if you want longish term reliability is to add redundancy to whatever you're storing with a tool like par2, http://www.par2.net/ [par2.net] and http://www.quickpar.org.uk/ [quickpar.org.uk] are your friend.
Raid5 will help you if you lose a whole drive (e.g. siezes up from sitting still for a long time), the par2 data will both allow you to verify that the data hasn't been corrupted, and if it is (e.g. a couple sectors go bad), it will let you recover the data.
Re: (Score:2)
since scientific datasets tend to compress well
Really? The Datasets I deal with are fairly gaussian in nature, I've yet to find a good compression algorithm that works on segy.
bzip2 (Score:5, Funny)
And optar:
http://ronja.twibright.com/optar/ [twibright.com]
You know it makes sense.
Re: (Score:2)
It would take approximately 5242.88 pages to store 1 gigabyte. This comes up from time to time. Laser printed pages will not store well over time. The toner degrades and if the pages are stacked together you lose all the sheets. Some inkjets have ink that will not glue the pages together but some ink will migrate and some is nutrient source for bacteria.
One of the better printed codes I've seen uses this http://microglyphs.com/english/html/dataglyphs.shtml [microglyphs.com] As an added bonus this coding can be printed with v
GMail Drive (Score:3, Funny)
Unlimited space with several accounts.
Re: (Score:2)
It was mainly a joke, but if you do want to get technical, it's possible by rewriting the GMail Drive filesystem driver to handle multiple accounts for storage.
Re: (Score:2)
Re: (Score:2)
In theory you can you use gmail to store as much data as you could ever want.
The only limit is google's own limit on storage space they allocated for the gmail service as a whole(which comes close to infinite I reckon).
There are several programs that interact with gmail as if it were a local folder for backup purposes.
The programs split your data into sizes gmail accepts (zip, rar or something similar) and up/downloads them to/from your gmail account just as easy as any other off site backup service.
I'm sur
Re: (Score:2)
"Unlimited space with several accounts." (Emphasis added.)
By my count, a few thousand accounts should do the trick. Maybe this guy could help you set them up: http://www.pcpro.co.uk/news/201252/hacker-takes-50-000-a-few-cents-at-a-time [pcpro.co.uk]
Re:GMail Drive (Score:5, Interesting)
That's what ZFS is for.
mount -t gmailfs /disk1 -o username=gmailuser,password=gmailpass /disk2 -o username=gmailuser,password=gmailpass /disk3 -o username=gmailuser,password=gmailpass /disk4 -o username=gmailuser,password=gmailpass /disk5 -o username=gmailuser,password=gmailpass
mount -t gmailfs
mount -t gmailfs
mount -t gmailfs
mount -t gmailfs
zpool create gzfs raidz1 disk1 disk2 disk3 disk4 disk5
Actually.... I think I just found my project for the evening. I mean it's already been done with 12 USB drives [sun.com]
Amazon AWS? (Score:5, Interesting)
Exactly. (Score:2)
Exactly. Let someone else do it. I don't know if Amazon is the right place, but the answer is still the same: Let someone else do it.
Why do we see questions like this so often? Why aren't people going to existing services with guaranteed availability that let you store a generic blob? Pass the buck -- they're probably going to do it better anyway.
Re:Exactly. (Score:5, Informative)
Re:Exactly. (Score:5, Insightful)
Ok, yes, we see you know a lot about this.
So what's your recommendation?
Re: (Score:3, Funny)
Huh, don't you see has has Too Much to Do?
Re: (Score:2, Funny)
I think his/her recommendation was: 17PB Tape.
Re:Exactly. (Score:5, Insightful)
Feel free to ask more questions.
Re:Exactly. (Score:4, Interesting)
Re: (Score:3, Interesting)
We played around with DIY JBOD a bit (i.e., moving the complexity up into software) because it seemed a lot cheaper, but we have yet to get the thing to operate as reliably and simply as our fibre channel RAID units. The main problem we're running into is that for SATA to be practical, you need to multiplex several SATA disks onto single SATA ports, but that software su
Re: (Score:3, Interesting)
Why distributed FS and not something like live mirroring/shadow copy? I wonder also... what do you consider an "expensive disk system"?
Distributed file systems are great because your limitations are (almost) always going to be hardware. Want 1000 boxes serving up your content? Get 1000 commodity boxes with disk. Need 10000? Also, not a problem. A box filled with raw disk is WAY cheaper than an EMC, Nexsan, etc (i.e. expensive disk system).
Serving over Ethernet should be fine, as you can always bond network connections together to increase throughput from your storage boxes to whatever boxes are processing the data (or even process the dat
Re: (Score:3, Informative)
Re: (Score:2)
According to http://aws.amazon.com/s3/#pricing [amazon.com], S3 will cost you about $150/month per TB. OTOH, it appears that all data transfers into S3 are free until June 30th, 2010, after which transfer fees will be about $100/TB. So if you want to do it, do it now. Be prepared to spend to get your data back out, if you ever need it.
For comparison, this week I bought a 1TB USB 2.0 external HD for under $100, so a DIY RAID should save you money in the long run.
I do have to ask one question: Exactly how is a tape li
Re: (Score:2)
Hard drives do not store well, that's not how they are designed or warranted by the manufactures. The physical Tape media is designed to be stored in a safety deposit box a long time. I wouldn't believe their 50+ year claims (and who would have a drive from even 10 years ago?) but Tape is probably the best.
The problem with tape is that it's pushing $15K to get the proper server with the proper capacity set up.. that's a lot of months of paying Amazon... and then in 5 years your hardware warranty runs out an
Different manufacturers (Score:5, Insightful)
Hard drives are ridiculously cheap these days, especially for how much data you are storing. You may wish to consider buying drives from different manufacturers but of the same size to put in a single mirrored set. This way if there is a problem with a particular batch of drives it won't ruin everything.
Tape is your friend (Score:5, Informative)
LTO tape, properly stored, will outlast burned optical media and hard drives. Great stuff and designed specifically for what you're talking about.
http://en.wikipedia.org/wiki/Linear_Tape-Open [wikipedia.org]
Re: (Score:2, Funny)
Go Betamax!!!
Re: (Score:2)
Re:Tape is your friend (Score:4, Informative)
I agree, when the tapes are stored in proper environmental conditions. You don't need a library, just use some stand alone tape drives. Also look at the claimed media lifetime and recovered bit error rate figures to see if you are choosing the right tape drive/media.
Re:Tape is your friend (Score:5, Informative)
Couldn't agree more. A tape library (as in autochanger) might be out of your budget, but a simple tape drive wouldn't be too much -- say $5000 for an LTO4. Media is $50-$100 or so depending on where you shop. Seriously, you're not going to find a reasonable way of storing that much data anywhere else.
BTW, if you're not a member of LOPSA [lopsa.org], you may want to seriously consider it. Even if you're not a sysadmin, this is definitely a sysadmin-type question, and their mailing lists are second to none. It's an excellent resource.
Re: (Score:2)
But SOMEBODY has to tend those tapes. It's not a REAL backup unless you can prove it works. The big problem with tapes is that "* proper environment" condition. If you're only worried about 2-3 TB, then you're not going to be putting these in "guaranteed" conditions 100% of the time... hence you can't guarantee their useful shelf life. So you need several more TB to periodically restore and re-backup the data every 6 months or so so that you have multiple "known-good" copies.
Then the company has to pay som
Not as complicated as you think (Score:3, Informative)
With recent backup systems like AMANDA you can just dump a file with dd or similar and the instructions on how to deal with the data are there in the header in ASCII! It couldn't possibly be easier.
Also the "proper environment" is
Re: (Score:2)
Re: (Score:2, Funny)
3.2 TBA what kind of weakling only has 3.2TB?
That is a like throwing Zip drives at the problem.
Re:Tape is your friend (Score:4, Insightful)
There's some code lurking in the amanda backup package I did a while back for "RAIT" (RAID with tape instead of disk) to make a stripe-set of tapes, if you need several tapes worth of data in one set, with redundancy.
On the other hand, while LT04 tapes are about half the price ($40) of cheap 1TB disk drives ($80), the tape drives are ablout $2k apiece, so depending how many data sets you want to keep, and for how long, the disk drives may really be cheaper...
Re: (Score:2)
Re: (Score:2)
$2000 is one not-particularly-brilliant workstation. If he's running a business which is heavily computation-oriented (which multi-TB datasets implies that it is) then $2000 is not a large one-time outlay.
Re: (Score:2)
This. Depending on how long you want to store it tape lasts longer, and once the upfront cost of the drive is paid off the per unit cost is cheaper too. Also dealing with offsite storage places(iron mountain) is easier with tape then with HDDs.
Lastly, I've been told you have to spin up the HDD's every so often or the lifetime rating is even less then what they are rated for. Although I'm not sure I believe that part.
Re: (Score:2)
Lastly, I've been told you have to spin up the HDD's every so often or the lifetime rating is even less then what they are rated for. Although I'm not sure I believe that part.
It's by no means unbelievable. Lubricant, rubber and plastic have this annoying tendency to degrade over time even if they're just sitting there. Metal actually does too, but perhaps not quite so quickly. And newer plastics aren't nearly as bad as they used to be, but I still don't trust their longevity that much just yet...
An
Agree with the tape option..;. (Score:4, Informative)
Tape is probably your best option. You can buy at DAT-5 (or even a DAT-4) tape drive for not very much. The tapes cost about $10 to $30 each (depending on what tape option you choose). Make 3 copies of the data set, store one onsite, store another offsite in a secure/climate controlled facility and send the 3rd to the client. Buy a spare tape drive and use both to make writing across tapes easier. There is a wide variety of software to write to the tape; we use the aging Retrospect.
The disk options is just way too complex; if anything, skip the RAID option and just store 2 copies. Putting the RAID sets back together and finding the RAID software will be nearly impossible in a couple of years. Use some standard formatting on the drives (FAT, NTFS, etc.) and you'll be good to go for the next 15 years.
Re: (Score:3, Interesting)
Re: (Score:3, Informative)
LTO-4 drives can be had for under $2k if you shop carefully, and a 1.6TB tape can be had for between $50 and $80.
Cheap, simple, and far more reliable long term than hard drives. They also support Write Once Read Many, which is a regulatory requirement for some industries. Getting WORM on a hard disk would be a really neat trick, particularly the part about guaranteeing that it cannot be written to more than once.
A better way to look at it, is disk is best for quick backup and quick recovery from failure (
Amazon S3 (Score:4, Informative)
It can get a little pricey for huge datasets, but Amazon S3 now has an option where you can ship your data [amazon.com] on a big set of disks directly to them, they will import everything into S3, and it will live there forever. The nice thing about S3 is unlike physical disks, it can grow essentially forever, and comes with retention and redundancy guarantees. And once your stuff is in S3, you can recycle the same disks to mail them more data.
Re: (Score:2)
Mod up.
Online storage in a properly managed data center is the way to go for long term safety. Keep a local copy exactly as you are doing and send a second copy to a data center (eg. Amazon).
PS: You don't say if the data is compressed or not. Does it compress?
Re: (Score:2, Informative)
I hope so. A 3TB dataset on Amazon S3 would run $450 / month for storage.
Go with Blu-ray (Score:3, Interesting)
Re:Go with Blu-ray (Score:4, Insightful)
I thought burned optical discs started to degrade after a few years. Have they solved this problem?
Drobo fan and user (Score:3, Interesting)
Re:Drobo fan and user (Score:4, Informative)
You misunderstood the post. He needs 2-3TB PER CLIENT, not 2-3TB total.
Blu-Ray (Score:2)
Re: (Score:3, Insightful)
I'd encrypt the data and... (Score:5, Funny)
Re: (Score:2)
I was having a terrible day at work - our tape drives for 1 TB Backups are failing, funny coincidence. Licensing issue though, not programmatically.
Anyways, this made my day. I'm going to tell it to all my friends, I hope you don't mind.
Use RAID6 not RAID5 (Score:4, Insightful)
I would use RAID6 not RAID5, since 2 drive failures means data loss with RAID5, while it takes 3 drive failures to loose data on RAID6.
Linux MDADM has supported RAID6 for years, it's stable.
I would mix and match drives, not buying all the same model from one maker. One Samsung, One WD, One Hitachi, One Seagate.
That gets you 4TB in 4 drives, and unlike a RAID1, any 2 drives can fail with no dataloss.
You can further ensure no dataloss by making a second copy using different brand drives for each clone.
Eight 2TB drives is around $1500. Not bad for a very safe 4TB backup.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Insightful)
Just because you had some problems with Samsung means nothing about their general reliability.
A few specific models have had problems, such as the IBM "Deathstar" models, or the recent Seagate firmware problems, but there is no evidence that whole brands are less reliable.
Read the Google report on drive brands, there are no clear winners or losers across brand lines in their exhaustive real world tests.
Extra Care Required!!! (Score:2)
There are going to be quite a few storage service names thrown out as well as compression schemes.
1. Storage vendors you run real risk of having the data go away. There's a huge liability balancing act going this route.
2. Compression schemes. As someone who has lost data to compression errors, the consequences of 'just' compressing a file can be huge. http://www.linuxquestions.org/questions/linux-software-2/recovering-files-from-corrupt-tar-archive...-326716/ [linuxquestions.org] (not my post, but similar story)
I would sugg
WORM Jukebox (Score:2)
http://www.cddimensions.com/Blu-ray-Libraries/products/192/ [cddimensions.com]
I think one could DIY a par + WORM jukebox with a waaay off-site tape storage and rest easy.
the $100,000 compression question (Score:2)
the good about compression, is less storage and better chance of detecting errors. the bad is at a minimum every bad bit becomes at least a bad byte, and if it is in the header, all data in that archive.
for example:
About corrupted compressed archives: gzip'ed files have no redundancy, for maximum compression. The adaptive nature of the compression scheme means that the compression tables are implicitly spread all over the archive. If you lose a few blocks, the dynamic construction of the compression tables [gnu.org]
use a tape drive (Score:4, Informative)
LTO Tapes (Score:2)
The only answer here is LTO tape stored at a contracted record archival facility. Optical media degrades and is easily damaged, hard drives fail ALL THE TIME and will have obsolete interfaces in a few years. Tape has very long shelf life when stored properly -- it is time tested and trusted. It is not that expensive to get one tape drive and a few carts for each customer.
Tape is crap anyway. (Score:2)
I've never had good experience with tape, from DC6150 SCSI linear tape at home all the way through an Exabyte library with stacks and stacks of 8mm tapes. Two decades of tape has been two decades of heartache and frustration for me and the companies I've worked with. These days I'm no longer in tech or IT (thank god) but for my personal needs I use RAID-1 for live and DVD-RAM (as cumbersome, slow, and small as it is) for offline.
Tapes just bleed data at an alarming rate, and they are about as reliable as a
Re: (Score:3, Informative)
Admitted. Never tried LTO, had limited experience (Score:2)
with DLT. My posting is purely my opinion, and is anecdotal. But it is my opinion.
Use a single tape drive, not a tape library (Score:2)
You don't need a tape library. Just get a single tape drive, and you will be able to store everything on 3-6 tapes. Yes, you will have to swap tapes by hand, but it is a lot cheaper.
LTO-4 stores 800 gig per tape, uncompressed. If you let the tape drive do the compression, you might even be able to get away with one or two tapes. Tapes are inexpensive, and are designed for long term storage.
LTO-4? (Score:2)
With easily compressible data (e.g. genomics data), I've gotten as much as 5TB onto a single LTO-4 tape using the regular drive compression.
An LTO-4 tape costs me ~$50. It's smaller than a 3.5" SATA drive and easier to handle. It can probably even survive a drop to the floor from chest height.
You'll need to spend some money on a drive or tape library. So it depends on how many datasets like this you need to write.
rotation (Score:2)
You have to keep rotating onto newer media, and newer media technologies. This sounds horrible, "oh no! I'm generating ten full drives per year". But realize in a couple years, all those drives will fit on a USB 4.0 stick, or on a card in your cellphone.
If you haven't read it (and recopied it) in a couple years, its probably gone.
Umm (Score:2)
Why?
I've worked in Visual Effects production and every time a new project came along we'd have to clear the servers of terabytes and terabytes of data. We used tapes. How are they impractical exactly? Inexperience?
Google Palimpsest (Score:2)
Depending on the openness of the data you could ask Google if their Palimpsest project is still operational. Basically they wheel large storage systems around for scientific research. But I believe they want to keep a copy by themselves, Alexandria style.
How to (Score:2)
Buy a netapp. Yay, RAID-DP.
That was hard!
* wipes brow *
Active storage is the only way (Score:2)
The problem with storing things is that they tend to degrade over time, and you never know when they'll fail.
Without being ridiculous, four sets in two locations is the best bet. Two sets are on line, and a regular parity check should be made between the two, with full data verification on a longer scale basis. One backup set gets made of each online set (an external drive which is sync'd once a week/month is likely good enough) and stored unpowered. This prevents local disaster from destroying your data,
or build a couple of these... (Score:2)
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ [backblaze.com]
I'm sure the price has come down some since this article was published...
For those too lazy or paranoid to read the link... It describes how backblaze builds "cheap" 67 TB storage boxes for use in their online backup service. All the hardware specs are open sourced and freely available. They also talk a little bit about the software for managing all of the spce they have, but not in any real detail...
Beware RAID (Score:2)
As far as I know the 2TB Raid problem hasn't been fixed. http://blogs.zdnet.com/storage/?p=162 [zdnet.com] If anyone knows differently, please let me know.
I've been using a drive docking station and splitting my backups for large databases.
have you considered cloud storage? (Score:2)
I have a bunch of old data backups on CDR that were great for years but they've started to degrade. I'd be willing to bet that any magnetic disk would be even more vulnerable to data corruption over time. I don't think your RAID 5 storage technique is a good long-term option.
This could be a ridiculous suggestion, but have you considered something like cloud storage for this? You could encrypt the data and store it in somebody's cloud and let them worry about backing everything up.
Never us DVDs as long term storage. (Score:4, Interesting)
Comment removed (Score:5, Informative)
Another Option / Definition issues (Score:3, Interesting)
A problem I have here is the definition of 'long term'. To each of us it means something different.
In my job I have to archive 1.6 terabytes of data per day, and keep it around for 45 days (which, BTW, is not my definition of LONG TERM). For this task I utilize Data Domain storage, which utilizes data deduplication techniques for massive compression.
What you find is that at the block level your data may in fact be incredibly deduplicatable. In my case it very much the situation. I am currently storing 86 terabytes of rolling archives within 2.5 terabytes of physical disk space.
The problem with any technology you use for 'long term' storage is the ability to read those archives later. Assuming the media doesn't self degrade inside of the time frame you call 'long term', you must have the tools to read that media again. If you use BluRay, then you must store a compatible drive with it. (Nothing says Sony will not change the standard in two years and make all current drives obsolete, so no one makes them any more). Tape is worse, in that in two major model revisions, drives wont be able to read your media because its density is to low for the new drive head technology. Hardware based disk raid has the issue that the controller the raid was built with needs to stay with that raid. Another controller from the same manufacture, with the same model number, but a different firmware revision may not be able to figure out the raid, and declare the drives empty. Software raid is a little easier to deal with as long as you keep a copy of the OS you used to create it with in the same box. But then, during your defined 'long term' period, will you still have access to a system you can even plug these drives into, or run the OS on?
What you end up dealing with in reality is that as an archivist, you either ignore these facts, or you invest in a constant media / technology refresh and spend large amounts of time keeping your archives on the latest storage available.
Of course, all this falls apart if your definition of 'long term' isn't as long as some will project. In my case, my archives roll over every 45 days. I could easily keep that data alive for years on a live piece of hardware with a service contract. If I do not trust that hardware enough, I can buy two and replicate between them. (which, actually I am, for disaster recovery purposes)
With deduplication my (acknowledged) high initial investment quickly outweighs the cost of single purpose drives holding one copy, and wasting unused space. My purchase cost was less then $60k, but if I had to store all of that data in its raw form, my costs would be in the millions. However, if the data is not deduplicatable, then of course it is a moot point.
Each answer has it flaws. You decide which risks are acceptable, plan your best to deal with obsolesce, and define your definition of 'long term'. You also have to be ready to change your solution, when the one you choose today, fails to be the right solution for your needs in 5 years.
Re: (Score:2, Interesting)
Blu-ray not the answer (Score:3, Insightful)
Re: (Score:2)
That's why the original poster suggested that they're already using Linux software RAID. Much more robust than depending on a single firmware version.
Re: (Score:3, Funny)
Probably the giant rare earth magnets I stored in the next box over.
You're welcome. :-D
Re: (Score:3, Funny)