Distributed Data Storage on a LAN? 446
AgentSmith2 asks: "I have 8 computers at my house on a LAN. I make backups of important files, but not very often. If I could create a virtual RAID by storing data on multiple disks on my network I could protect myself from the most common form on data failure - a disk crash. I am looking for a solution that will let me mount the distributed storage as a shared drive on my Windows and Linux computers. Then when data is written, it is redundantly stored on all the machines that I have designated as my virtual RAID. And if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself when I add a replacement system to the virtual RAID. Basically, I'm looking to emulate the features of hi-end RAIDS, but with multiple PCs instead of multiple disks within a single RAID subsystem. Is there any existing technologies that will let me do this?"
Network RAID (Score:1, Interesting)
Expensive but reliable solution (Score:3, Interesting)
According to pricewatch the 4 160's could be had for around $400 total with about another $400 for the backup. Add a 3ware RAID controller for another $245 bucks and your looking at about $1045 to convert a system into supporting 450 GB of usuable network storage and backup.
From all indications IDE harddrives are now the cheapest form of backup there is. I've looked at CD, DVD, Tape, but it keeps coming back to IDE hard drives. This is far cheaper than a similiar storage and backup would be on tape.
Speed (Score:5, Interesting)
I can't believe... (Score:2, Interesting)
If all you're worried about is disk failures, mirror each disk locally. Disks are cheap, and real operating systems don't have any trouble with software mirroring.
Why would you want to make all of your machines suddenly non-functional, just because one of them lost a network card? Or the switch failed? Or
Re:AFS (Score:5, Interesting)
It requires it's own partition for each mount of it; you can't just share disks you've already got.
Setup also takes hours, and it probably won't work the first time. Online documentation is incredibly outdated, which doesn't help matters at all. It also takes a hefty chunk of computer to run it, because it requires a lot of watchdog type programs to fix the frequent corruption that happens to it as you use it.
The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.
It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.
I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.
I can't wait for the day when these problems are but a memory and such a system works flawlessly.
Re:Most common form of data loss? (Score:3, Interesting)
AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).
It's a good feature, something I'd want on my docs and code, and other specs, not necessarily on my pr0n and MP3s.
-Chris
New kind of network file system needed (Score:2, Interesting)
A new networked file system is needed. I am working on such a solution on my spare time (but it is still in the design phase).
The main idea is to unify cache and storage. This means that the least used files are deleted when an account is running out of storage, but under the constraint that a mimum number of copies of the files are kept online. (Hence, data will propagate to the nodes that actually use it). Upon a data request the filesystem goes out and fetch the data. Preferably in some P2P-like way where it is fetched simultaniously from all locations that has copies of that data.
If someone knows a solution that already works something like this, please tell me.
Re:Backing up all within your house (Score:3, Interesting)
Re:NBD Does this (Score:3, Interesting)
File versioning useful, VMS variant not so sure (Score:3, Interesting)
Try #1:
DELETE FOO.TXT
This is really the wrong answer. If you have FOO.TXT;1 and FOO.TXT;2, then this command deletes FOO.TXT;2 and any attempt to access FOO.TXT will get you FOO.TXT;1.
Try #2:
DELETE FOO.TXT;*
This is the common recommendation, but you've now lost the ability to see any of the old versions.
The GNU file utilities (and emacs and some other GNU programs) have a file versioning scheme which is somewhat similar to VMS but somewhat better. Look at commands like "VERSION_CONTROL=numbered cp foo bar".
Personally, I usually put things which matter in CVS. With the CVS server in a distant city (at an ISP which provides ssh shell accounts). That gives me off-site backups.
Re:You aren't gonna get a real RAID. (Score:2, Interesting)
Also, for those people concerned about leaving another "backup server" running 24x7, you can make use of the "wake on LAN" capability to do backups (available on many LAN/motherboards). Just wake up (boot) the "backup server", do your backup, and then shut it down. It's way cool to remote-boot home servers.
Here, the only real issue is the power/thermal cycling of the hard disk once a day (or whatever), which might be a problem since many disks now tend to come with only a one-year warranty. However, this isn't all that different from a regularly-used PC.
Re:Coda (Score:3, Interesting)
Seriously, I looked into Coda a couple months ago and the design looks really cool, but it just doesn't seem to work very well unless you're only storing tiny text files. It also doesn't scale very well on large servers (i.e. it has a maximum limit on number the of files on each volume). Don't get me wrong, I REALLY wanted to use Coda because I liked the idea of it -- I just wish that it worked better. Ended up going back to NFS (yuck!).
Re:Most common form of data loss? (Score:3, Interesting)
That feature doesn't need to be in the kernel, since it can easily and transparently be provided in user space.
If you like, you can enable this right now using a simple hack on top of PlasticFS [sourceforge.net] or your own, custom LD_PRELOAD hack.
Providing file versioning in the kernel or enabling it globally in some other form has not caught on because it is a huge hassle and causes lots of problems, even in systems that know about it.
For example, when you retag one MP3, do you want to keep an old version? What about if you retag your entire 50G collection of MP3s?
The default of not versioning files in UNIX works better. Versioning and its implementation is highly application and implementation dependent. Emacs, OpenOffice, cvs, and other tools do the right thing, and they do it much better than anything the kernel could ever hope to do.
Re:Win2k (Score:2, Interesting)
The concept of giving all users read/write access was thought up later on and it happens to work, but as you say, if two users update the same file, you may/will lose data.