Forgot your password?
typodupeerror
Data Storage The Internet Hardware

Distributed Data Storage on a LAN? 446

Posted by Cliff
from the redundancy-gooood dept.
AgentSmith2 asks: "I have 8 computers at my house on a LAN. I make backups of important files, but not very often. If I could create a virtual RAID by storing data on multiple disks on my network I could protect myself from the most common form on data failure - a disk crash. I am looking for a solution that will let me mount the distributed storage as a shared drive on my Windows and Linux computers. Then when data is written, it is redundantly stored on all the machines that I have designated as my virtual RAID. And if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself when I add a replacement system to the virtual RAID. Basically, I'm looking to emulate the features of hi-end RAIDS, but with multiple PCs instead of multiple disks within a single RAID subsystem. Is there any existing technologies that will let me do this?"
This discussion has been archived. No new comments can be posted.

Distributed Data Storage on a LAN?

Comments Filter:
  • NBD Does this (Score:5, Insightful)

    by backtick (2376) * on Wednesday October 29, 2003 @04:12PM (#7341293) Homepage Journal
    http://nbd.sourceforge.net/

    "Network Block Device (TCP version)

    What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you boot from floppy) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

    Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.

    Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.

    Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.

    If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.

    Network block device has been included into standard (Linus') kernel tree in 2.1.101.

    I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.) "
    • by flok (24996) * <mail@vanheusden.com> on Wednesday October 29, 2003 @04:19PM (#7341379) Homepage Journal
      And since the guy is also using windows-boxes, an NBD-server for windows can be found here:
      http://www.vanheusden.com/Loose/nbdsrvr/ [vanheusden.com]
      This version enables you to also export partitions/disks.
    • I maintain a lab with 16 Linux computers (running Red Hat 8) and 1 server. Right now, I have about 150gb or so on the server that I NFS out to all the workstations. However, each workstations has 20-80gb that they don't need and aren't using... The users all have their home directory mounted via NFS, and must have read/write access to them (obviously). Each user also must be able to SSH in, and access the console (wouldn't be much of a lab if the users couldn't sit down at a computer). I also would like to
      • Performance could potentially be very terrible, especially with RAID5

        That being said, do some benchmarks. RAID1+0 might be more sane. (That is, a RAID1 array overtop a RAID0 array.)
    • Re:NBD Does this (Score:5, Informative)

      by dbarclay10 (70443) on Wednesday October 29, 2003 @04:37PM (#7341575)

      Just to clarify what this guy is saying:

      1) Make all your machines NBD servers. NBD for Linux [sourceforge.net], NBD for Windows [vanheusden.com]. NBD stands for "network block device" and allows a client to use a server's block device.
      2) Set up a master client/server (using Linux or something else with a decent software RAID stack). This machine will be the only NBD *client*, and it will use all the NBD block devices exported by the rest of your network.
      3) On the master set up in 2), create a Linux MD RAID array overtop all the NBD devices that are available.
      4) Create a filesystem on the brand-spanking-new multi-machine RAID array.
      5) Export it back to the other machines via Samba or NFS or AFS or what have you.

      Why does only one machine (the "master server") access the NBD devices, you ask? Because for a given block device, there can only be one client accessing it safely. Thus, if you want to make the RAID array available to anything other than the machine which is *running* the array off the NBD devices, you need to use something which allows concurrent access; something like NFS, Samba, or AFS.

      Hope that clears it up a bit.

    • Obvious link [drbd.org].
    • Re:NBD Does this (Score:3, Informative)

      by caluml (551744)
      Hmm. How stable is it? From /usr/src/linux/Documentation/nbd.txt:

      Note: Network Block Device is now experimental, which approximately
      means, that it works on my computer, and it worked on one of school
      computers.

      That doesn't sound very promising to me. Usually stuff that's been in the kernel since 2.1 days is rock solid.

      Isn't AFS/Coda more like the guy wants (excluding Windows-ability, although I seem to remember there being something for Andrews for Windows)?
      • Re:NBD Does this (Score:3, Insightful)

        by arivanov (12034)
        There ae inherent pitfalls in it. They are mostly similar to the problem of swapping over NFS. It overall boils down to buffer management.

        Basically, in order to execute the network device request you often have to get more memory. In order to get more memory you have to execute a network request. So on so forth.

        Also, AFAIK RAID does not work properly over NBD.

    • Re:NBD Does this (Score:3, Interesting)

      by WindBourne (631190)
      I currently do this at home with 3 computers (all Linux) for my home directory. But I have been thinking that there needs to be a way to seperate parts of etc for the local system vs. the network. I have been thinking of how to write a block device that allows layers to be combined.
    • by cbreaker (561297) on Wednesday October 29, 2003 @05:52PM (#7342217) Journal
      What if you reboot one of the NBD servers? While you'll still have access to the data since it's a raid, I would well imagine that you would have to rebuild the entire "disk" once it comes back online.

      Assuming a Raid5 with three nodes, and two go down not at the same moment, will all your data be lost?

      I would think very carefully about these issues before putting all your valuable data on it. RAID isn't really designed for frequently unreliable connections like this. It's meant to prevent data loss if a hard drive crashes, which should be a fairly uncommon thing within a single system.
  • it's called rsync
    • sorry. conan cut me off last night, so i am upset

      we use afs (pre-openafs, tho i'm sure openafs will work just find) on top of nbd (link escapes me right now). works pretty well.
  • Win2k (Score:5, Informative)

    by SuiteSisterMary (123932) <slebrun@gmail.cUUUom minus threevowels> on Wednesday October 29, 2003 @04:16PM (#7341344) Journal
    I believe that Windows 2000's Distributed File System allows you to do just this.
    • I believe that DFS allows you to do just this.

      :s/DFS/DASD

  • rdist would work... (Score:5, Informative)

    by ZenShadow (101870) * on Wednesday October 29, 2003 @04:17PM (#7341353) Homepage
    The obvious answer for this is nbd, as pointed out in another post -- but I would have concerns about speed with that kind of setup. I'd be interested in hearing reports on that.

    But if you don't want to get into nbd, you can tolerate delayed writes to your virtualized disks, and all you want is the network equivalent of RAID level 1, then you could always just set up an rdist script that synchronizes your local data disk with a remote repository (or eight) every so often...

    --ZS
    • Speed (Score:5, Interesting)

      by backtick (2376) * on Wednesday October 29, 2003 @04:34PM (#7341546) Homepage Journal
      Using a pair of Intel EEPro 100's w/ trunking (using both links at the same time on one IP, works w/ a cisco switch), I've gotten over 100 Mb/sec of actual throughput (I think I hit 137 Mbit/sec, peak) out of a box using NBD to create a mirror'd RAID volume over the trunked ports. Now, my actual 'real' data speeds to the file ssystem were about half that (Call it 50-65 Mbit, or 6 to 7.5 MByte/sec), due to mirroring == writing it twice. Still not bad. Yes, the target disks were themselves part of other RAID volumes, for speed :)
  • by buzzbomb (46085) on Wednesday October 29, 2003 @04:18PM (#7341368)
    Perhaps multiple files over different networking procotols (SMB for Windows machines, NFS for the Linux machines) mapped to built-in loopback devices (/dev/loX) accessed through built-in md utilizing software RAID5? Heh. It might not be pretty or fast, but it would probably work just fine. It may just give the kernel absolute fits though.

    Anyone tried this?
  • AFS (Score:4, Informative)

    by Reeses (5069) on Wednesday October 29, 2003 @04:18PM (#7341374)
    It's called the Andrew File System.

    http://www.psc.edu/general/filesys/afs/afs.html

    There's another alternative with a different name, but I forget what it's called.
    • Re:AFS (Score:5, Informative)

      by Strange Ranger (454494) on Wednesday October 29, 2003 @05:24PM (#7341980)
      from karmak.org
      AFS is based on a distributed file system originally developed under a different name in the mid-1980's at the Information Technology Center of Carnegie-Mellon University (CMU). It was first publically described in a paper in 1985, and soon afterwords was renamed to the "Andrew File System" in honor of the patrons of CMU, Andrew Carnegie and Andrew Mellon. As interest in AFS grew, CMU spawned the Transarc Company to develop and market AFS. Once Transarc was formed and AFS became a product, the "Andrew" was dropped to indicate that AFS had gone beyond the Andrew research project and had become a supported, product quality filesystem. However, there were a number of existing cells that rooted their filesystem as /afs. At the time, changing the root of the filesystem was a non-trivial undertaking. So, to save the early AFS sites from having to rename their filesystem, AFS remained as the name and filesystem root. In the late 1990's Transarc was acquired by IBM, who subsequently re-released AFS under an open source license. This code became the foundation for OpenAFS, which is currently under active development.
      It's still running and running well at CMU (AFAIK - as of late 90's). Every student gets an "Andrew" ID. Actually the very first networked computer I ever logged into (other than dialing a bbs) was a 'node' on Andrew, in 1988. Very very cool at the time, and still is.
  • Why? (Score:2, Funny)

    by Anonymous Coward
    I have 8 computers at my house on a LAN. I make backups of important files, but not very often

    I mean, let's be honest here. We are all dorks, but this guy is king dorkus dweedius maximus. Don't fool yourself about the "important data" - it is just pr0n and pirated MP3s.

    If it was real work, there would be a real IT guy with real RAID and real backup tapes working on the problem,. But we know it isn't real work, because if this guy had a real IT job, h couldn't stand coming home and dealing with 8 frigg
  • by Anonymous Coward on Wednesday October 29, 2003 @04:19PM (#7341385)
    I'd argue the point that the most common form of data loss is a crashed hard disk.

    In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

    But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.

    However, I will grant that the average /. user knows what they're doing with their data far more than my average user does and is less likely to cause self-inflicted damage.
    • That's why I don't know why _by default_ it isn't set up to have the whole of /home under cvs
      • That's why I don't know why _by default_ it isn't set up to have the whole of /home under cvs

        CVS isn't designed for that, unless you only store documents or have some pretty stringent filters setup on CVS. CVS is for versioning, and you don't really want to maintain a backlog of every version of every file in your home directory.
        • But say I do? I mean, versioning databases are the next bit, man. Why not have a chmod +v for versioning? If this bit is set, then apply version control. Every file open/write/close sequence adds a new version delta. Sure, there's a performance hit associated with it, but I'd like the choice.

          AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).

          It's a good feature, somethi
    • by Blackknight (25168) on Wednesday October 29, 2003 @04:39PM (#7341598) Homepage
      That's one feature from VMS that I wish unix had. File versioning was built in to the file system, so if you wanted the old version of a file back you just had to roll back to the old one.
      • The concept of being able to see the previous version sounds good. But on VMS, file versions didn't really achieve this all that well. Classic example: how do you delete a file?

        Try #1:

        DELETE FOO.TXT

        This is really the wrong answer. If you have FOO.TXT;1 and FOO.TXT;2, then this command deletes FOO.TXT;2 and any attempt to access FOO.TXT will get you FOO.TXT;1.

        Try #2:

        DELETE FOO.TXT;*

        This is the common recommendation, but you've now lost the ability to see any of the old versions.

        The GNU file utiliti
      • That's one feature from VMS that I wish unix had.

        That feature doesn't need to be in the kernel, since it can easily and transparently be provided in user space.

        If you like, you can enable this right now using a simple hack on top of PlasticFS [sourceforge.net] or your own, custom LD_PRELOAD hack.

        Providing file versioning in the kernel or enabling it globally in some other form has not caught on because it is a huge hassle and causes lots of problems, even in systems that know about it.

        For example, when you retag one MP
    • In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

      Business environments are generally more robust - especially when it comes to things like power. Not only the mains power, but power supplies. A lousy power supply can kill a hard disk as easily as a line surge. In the last ten years I've personally lost a 4.3 GB Atlas Wide SCSI and a couple of Maxtor 60GB IDE drives. In both cases my backups a month out-of-date. :-(

      Also ha
    • I've lost probably 4 hard drives over the last 3 years (1 2 weeks ago, 1 I realized is going bad today). While I could raid them, I really don't want to buy double the disk space just so that I can have a raid array when I only need redundancy for 10% of the data.
      • by steveha (103154) on Wednesday October 29, 2003 @05:18PM (#7341928) Homepage
        0) Mirroring (RAID 1) takes double the disk space; but you could use RAID 5 instead. A 4 disk RAID 5 would take 4/3 as much disk space as you get to use.

        1) You could make a partition that is 10% of your disk, make another identical one on another disk, and mirror those. Then put your 10% critical data in there.

        2) Do what I do: set up a RAID server, and keep all critical data on that. This is good if you have a home network with multiple computers. It also makes data sharing easy among the computers.

        steveha
    • I don't know man. I have no faith in hard-drives any more. I use to buy Quantum drives and I never had a single crash with any of them. I still have 2 Quantum drives from years past and they are perfect. Unfortunately Quantum was bought buy or merged with Maxtor. Huge mistake. In the last 2 years though I've had 3 Maxtor drives crash on me, and 2 IBM Deathstars [google.com] die on me. The last time I sent my Deathstar in for RMA, after having read that the entire line of drives was prone to failure, I just sold
    • > I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

      Yes, but all us users of more than one home pc (ie, enthusiasts) use RAID 0, which has the opposite effect. So for us, a suplemental distributed RAID is a GREAT idea for our documents, e-mail backups, and other stuff we want to keep permanently and access from any of our home stations.
    • by angst_ridden_hipster (23104) on Wednesday October 29, 2003 @06:26PM (#7342456) Homepage Journal
      As I always chime in at this point:

      Use rdiff-backup!

      http://rdiff-backup.stanford.edu/

      Configurable, secure, distributed, versioning incremental backups.

      It's not a replacement for RAID, but is good for nightly inter-machine backups.

      There's also a related project where the far-end repository is encrypted, so you can have it on any public server without fear of having your data read by the wrong people.

      Very cool. It's saved my ass a few times.
  • Intermezzo (Score:5, Informative)

    by mikeee (137160) on Wednesday October 29, 2003 @04:19PM (#7341389)
    Intermezzo [inter-mezzo.org] is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

    It isn't particularly high-performance, from what I know, and may be more complexity than you need.
    • Groove workspace if a collaborative environment, but it does have a component that allows you to share an archive of files.

      Worth considering because:
      - Files are encrypted and sent in an encrypted format.
      - Files placed in the shared space are mirrored on all systems that are members of the worspace.
      - The software is free for non-commercial use.
      - Lot's of other interesting features to play with.
      - You can even mirror with a machine accross the Internet.

      Limited by:
      - The speed of your connection.
      - W
    • Re:Intermezzo (Score:5, Informative)

      by laursen (36210) <laursen AT netgroup DOT dk> on Wednesday October 29, 2003 @05:12PM (#7341882) Homepage
      Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

      We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.

      The idea behind Intermezzo is fairly simple and the documentation is good. The Intermezzo system looked like an ideal solution for our setup (Coda and OpenAFS are far to complex for use in a distributed filesystem on a closed internal net).

      We tested the system but sadly it's not really production stable and I can't advise that you use it.

      If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...

      My 2 cents.

  • Bandwidth (Score:4, Insightful)

    by omega9 (138280) on Wednesday October 29, 2003 @04:20PM (#7341396) Homepage
    I hope you're looking at some fast lines to put between those boxen. Even at 100Mb/sec, doing RAID across a LAN could get slow.
  • RAID on Files (Score:3, Insightful)

    by Great_Geek (237841) on Wednesday October 29, 2003 @04:20PM (#7341402)
    I have often wanted the same thing, kind of like RAID on files, call it RARF (Redundant Array of Remote Files). I was thinking along the line of a device driver that presents an ATA/IDE interface to the file system on one side and passes the requests to multiple copies of virtual disks. The virtual disks would be like VMWare disks, and potentially each on a different machine/location. Each virtual disk could even be encrypted differently.

    This would be really useful for SOHO type places to allow me to have a hot offsite backup at multiple friends (and vise versa).
    • What you describe is a combination of the loopback and md drivers under Linux -- RAID1 (or 5 or...) on loopback devices pointing at files living on NFS disks. Or something.

      --ZS
  • by Alain Williams (2972) <addw@phcomp.co.uk> on Wednesday October 29, 2003 @04:20PM (#7341404) Homepage
    Hmmmm, what happens if your house catches fire ?

    8 copies of the same document all nicely toasted!
  • by Anonymous Coward
    As opposed to a tight one?
  • by Trolling4Dollars (627073) on Wednesday October 29, 2003 @04:21PM (#7341412) Journal
    I imagine you'll need gigabit ethernet or multiple NICs in bonded mode. Then you have the performance of each individual system to take into account. Especially if one of the systems is heavily used. I would recommend getting one BIG HONKIN' SERVER and putting it in a central location. Give it gigbit and let everything else connect to it at 100. Then, make sure it has a hardware RAID controller. Use SAMBA for the cross platform connectivity you desire, and viola! protected data with redundancy and high speed performance. If you go with remote display (RDP with Windows Terminal Server or X with *nix) then you have an even better appraoch as all the data will exist on the secure RAID box.

    I get what you mean though... it's a nice idea, but it would be costly to implement vs. what I suggested above.

    When I went to see a presentation on HP's SAN solutions last year, I was very impressed with the ideas they had. One big hardware box with multiple disks that are controlled by the hardware. They are then presented to any systems over a fiber link as any number of drives you wish for any OS. Finally, their "snapshot" ability was pretty impressive. (Also called Business Copy) All they would do is quiesce the data bus, then create a bunch of pointers to the original data. As data is altered on the "copy" (just the pointers, not a real copy), the real data is then copied to the "copy" with changes put in place. I imagein something similar could be accomplished with CVS...
    • Just remember to back up that big honkin' server with a reliable medium. Don't trust that steaming pile of shit from Seagate called Traven.
    • I think you meant voila, not viola. A viola is a musical instrument.
    • ...as much as I dislike replying to T4D, he brings up an interesting scenerio to counter your suggestion of using multiple machines.

      I took a spare machine, added a 3ware 6800 ATA RAID controller ($130 on eBay), and installed eight 120GB Maxtor hard drives ($1200 when I bought them last year) and put them in eight Genica hot-swap trays ($60). For about $1500, I now have an 800GB formatted RAID5 array. (Had to throw in a dedicated 400W Antec power supply for HDs.) In a year, two of the drives have flunked,
  • Coda (Score:3, Redundant)

    by fmlug.org (695374) on Wednesday October 29, 2003 @04:21PM (#7341416) Homepage
    Coda may do what your looking for
    # disconnected operation for mobile computing

    # is freely available under a liberal license
    # high performance through client side persistent caching
    # server replication
    # security model for authentication, encryption and access control
    # continued operation during partial network failures in server network
    # network bandwith adaptation
    # good scalability
    # well defined semantics of sharing, even in the presence of network failures
    More info here http://www.coda.cs.cmu.edu/
    • Re:Coda (Score:3, Interesting)

      by quantum bit (225091)
      If by "high performance through client side persistent caching" you mean "has to copy the entire 300MB video from the server to my local machine before it even starts playing, assuming it doesn't crap out because the default cache size is smaller than that", then yeah, go for it!

      Seriously, I looked into Coda a couple months ago and the design looks really cool, but it just doesn't seem to work very well unless you're only storing tiny text files. It also doesn't scale very well on large servers (i.e. it h
  • A perfect solution would be a form of network block device that mounts distributed NBD shares. The Linux DRBD Project [drbd.org] has this capability. From their website, "You could see it as a network raid-1".
  • ...I could protect myself from the most common form on data failure - a disk crash.

    In my experience, the most common form of data loss is not hardware failure, but user error. RAID is great for protecting against hardware failure, but be sure to still make backups to prevent against accidental deletion.
  • Try Rsync or DRBD (Score:4, Informative)

    by oscarm (184497) on Wednesday October 29, 2003 @04:23PM (#7341436) Homepage

    see http://drbd.cubit.at/ [cubit.at] DRBD is described as RAID1 over a network.

    "Drbd takes over the data, writes it to the local disk and sends it to the other host. On the other host, it takes it to the disk there."

    Rsync with a cron script would work too. I think there is a recipe in the linux hacks books to do something like what you are looking for: #292 [oreilly.com].

  • by DrSkwid (118965) on Wednesday October 29, 2003 @04:24PM (#7341438) Homepage Journal
    [bell-labs.com]
    http://plan9.bell-labs.com/sys/doc/venti/venti.h tm l

    Abstract

    This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems.

  • by onyxruby (118189) <onyxruby&comcast,net> on Wednesday October 29, 2003 @04:25PM (#7341459)
    I've been looking into something like this for a little while. What I'd like to do when I have the fundage is get a fileserver/backup box. The ideal is to run 4 160 GB IDE drives in RAID 5. This will give me a bit over 450 GB in usable network storage. I then want to add a pair of 250 GB 5400 drives for backup. I can then set up a the server to backup the data from the raid drives to the backup drives on a daily basis.

    According to pricewatch the 4 160's could be had for around $400 total with about another $400 for the backup. Add a 3ware RAID controller for another $245 bucks and your looking at about $1045 to convert a system into supporting 450 GB of usuable network storage and backup.

    From all indications IDE harddrives are now the cheapest form of backup there is. I've looked at CD, DVD, Tape, but it keeps coming back to IDE hard drives. This is far cheaper than a similiar storage and backup would be on tape.
  • hyper scsi (Score:2, Informative)

    by blaze-x (304666)
    from the website:

    HyperSCSI is a networking protocol designed for the transmission of SCSI commands and data across a network. To put this in "ordinary" terms, it can allow one to connect to and use SCSI and SCSI-based devices (like IDE, USB, Fibre Channel) over a network as if it was directly attached locally.

    http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi/ [a-star.edu.sg]
  • You can share iSCSI devices, if you do it the right way, between many different hosts. NBD sounds good, but for what you're asking, iSCSI or FCIP or some derivative sounds more correct. i.e. virtual block devices, or "real" block devices on a network that can be accessed by windows or *nix. you could RAID (md) iSCSI devices, or just use a system which "owns" all the iSCSI devices in an MD, and present it up using CIFS or SMB.

  • Rsync and Ssh (Score:5, Informative)

    by PureFiction (10256) on Wednesday October 29, 2003 @04:32PM (#7341521)
    This is the way I do it, and although a little clunky, it allows me to keep remote backups of certain directories one three different servers.

    First, setup ssh to use pubkey authentication instead of interactive password. You can read the man pages for details but it basically boils down to running keygen on the trusted source:

    ssh-keygen -b 2048 -t dsa -f ~/.ssh/identity

    Then copy|append the newly created ~/.ssh/identity.pub to the remote hosts into their /home/user/.ssh/authorized_keys file.

    Now you can run rsync with ssh as the transport (instead of rsh) by exporting:

    export RSYNC_RSH=ssh or also passing --rsh=ssh on the command line.

    So to sync directories you could use a find command to update regularly:

    while true; do
    find . -follow -cnewer .last-sync | grep '.' 1>/dev/null 2>/dev/null
    if (( $? == 0 )) ; then
    rsync -rz --delete . destination:/some/path/
    touch .last-sync
    fi
    sleep 60
    done

    Obviously this is pretty hackish and could be improved. But the point is that with ssh and rsync you could do automatic mirroring of specific filesystems or directories to remote locations securely.
  • What you seek is the holy grail of high-availability environments.

    So far, I've not seen anything that exists that does what you are asking for. Several technologies come somewhat close.

    What I've been hopeful of is the recent donations by Oracle for database clustering, but I haven't seen any decent fallout from that... yet.

    For now, on my home-based work network, I have two network drives (both IDE 120 GB) and do nightly rsynch from one to the other.

    (sigh)
  • by PurpleFloyd (149812) <zeno20&attbi,com> on Wednesday October 29, 2003 @04:34PM (#7341547) Homepage
    First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

    Instead of trying to implement a shoestring SAN, go the simple route: throw up a Linux box running Samba for your "backup server;" it doesn't need much horsepower, just fairly fast drives and a network connection. Then schedule copies of your documents and home directories (using a cron-type tool on Linux and XCOPY called by the Task Scheduler on Windows, you should be able to hack something together that copies only changed files) every night at midnight, or some other time when you aren't using your computers. Although you might lose a bit of work if the system goes down, you won't ever lose more than 24 hours' worth.

    If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server). You can of course mount the /home directory in Linux via NFS or smbmount, and Group Policy in Windows 2K/XP will allow you to change the location of the My Documents folder to whatever you choose. You might be able to do the same via the System Policy Editor on 9x; it's been a while and I can't find the information after a brief Google.

    To sum up:

    • Don't blow millions on a SAN for your house.
    • Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night
    • More expensive route: invest in a server with real RAID, then mount your important directories from that.
    • by Cranston Snord (314056) on Wednesday October 29, 2003 @04:46PM (#7341650) Homepage
      Instead of xcopy, try RoboCopy, included in the windows NT/2k/xp/2k3 resource kit available here. [microsoft.com] It gives you almost as much control as rsync, including directory synchronization, touch control, ageing, network failure support, and others. I use this at work to move around copies of live production data to backup servers located offsite via vpn without any issues. More information on syntax can be found here. [ss64.com]

    • No need for an "honest-to-dog hardware RAID". Linux software RAID is simply great.

      Set up a server with multiple hard disks in a Linux software RAID, and run Samba and NFS on that. The Linux software RAID HOWTO explains all you need to know.

      steveha
    • First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

      I'm currently running some benchmarks on an XFS filesystem built upon a Linux MD RAID1 array, which is in turn built upon a local disk and a remote disk (which is at the end of a switched 100mbit network, the NBD server itself having

  • by NerveGas (168686) on Wednesday October 29, 2003 @04:35PM (#7341552)

    Really. If you're on a 100-megabit LAN, that gives you a max of about 10 megaBYTES per second. So, if you have to transmit information to two other computers for every disk write, you're effectively limitting yourself to a maximum of about 5 megabytes/second disk transfer. And that's under GOOD situations. If you're doing random I/O, where the latency will be the determining factor, then take the latency of the hard drives, add in the latency of the networking, and the latency of the software layers, and you're looking at some pretty abysmal performance.

    Using rsync in a cron job will solve your backup problems. In fact, your script can use rsync to do the synchronization, and tar/gzip to archive the backup - giving you "point in time" snapshots for when someone says "I deleted this file 4 days ago, can you get it back?"

    steve
  • I can't believe... (Score:2, Interesting)

    by wcdw (179126)
    ...this question even got asked. Ok, if you *need* to share the same device across machine, something like the network block device can be a real help.

    If all you're worried about is disk failures, mirror each disk locally. Disks are cheap, and real operating systems don't have any trouble with software mirroring.

    Why would you want to make all of your machines suddenly non-functional, just because one of them lost a network card? Or the switch failed? Or ....
  • Intermezzo [inter-mezzo.org] and Coda [cmu.edu] both do this, but I don't think there's any windows versions available. There are some Microsoft things available too, but obviously those aren't for linux. NBD (which everyone else has mentioned) isn't distributed, so that's not really what you're looking for.

    What you might be able to do is put together a microcosm of Freenet [sourceforge.net] or something like it, running on just your home computers. There may be other Peer-to-Peer solutions available that are faster/more stable. Do some searching
  • Though not real time like a true RAID, I think what you're really after is something like rsync, as many other posters have mentioned. When this came up in an earlier story I found a like to Unison, which seems to be better for my needs at least.

    http://www.cis.upenn.edu/~bcpierce/unison/ [upenn.edu]

    Might be interesting to combine this with FSRaid [fluidstudios.com] (Parity Archive or PAR files) to get some extra redundancy.

    B

  • by CSG_SurferDude (96615) <wedaaNO@SPAMwedaa.com> on Wednesday October 29, 2003 @04:42PM (#7341624) Homepage Journal

    I do this everynight to thousands of machines...

    The software I use is Kazaa-lite.

    Oh, you mean files other than my MP3s/jpegs/mpegs? Sorry, I can't help you there.

    • by Cyno (85911)
      See, Kazaa is a perfectly legitimate technology, if only the RIAA and MPAA could stop polluting it with their copyrighted commercial garbage.

      I blame Jack Valenti for this whole mess.
  • Many responses, even highly-rated ones, seem to be talking about simple replication via NBD (worst-written code I've ever seen) or DRBD. That's not the same as what the original poster was asking about. Neither are fully-distributed but non-transparent file stores such as HiveCache [mojonation.net]. AFS/DFS/Coda/Intermezzo are probably the closest in the sense of being both transparent and resistant to failures. There have also been a couple of very closely related projects at Microsoft (Farsite and Pastiche) but I'm no

  • by richoid (180354) on Wednesday October 29, 2003 @04:51PM (#7341700) Homepage
    http://www.parl.clemson.edu/pvfs/

    "The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs."

    "In order to provide high-performance access to data stored on the file system by many clients, PVFS spreads data out across multiple cluster nodes, which we call I/O nodes. By spreading data across multiple I/O nodes, applications have multiple paths to data through the network and multiple disks on which data is stored. This eliminates single bottlenecks in the I/O path and thus increases the total potential bandwidth for multiple clients, or aggregate bandwidth."

    Or there are many others to chose from, google for clustered filesystems:

    http://www.yolinux.com/TUTORIALS/LinuxClustersAn dF ileSystems.html

  • Slow? (Score:2, Informative)

    by cerebralsugar (203167)
    I certainly would attest that this is a cool idea. I have a few systems at my place and it would be neat to make a single filesystem spanning all the storage on the network.

    However, while small files would be fine, I would think the speed of the network would make for some fairly slow storage on a 100mbit network.

    Add more users saving files across the network to the equation and things would get out of hand fast.

    I guess I would just buy a serial ata raid motherboard (the intel D865GBFLK is one I have bee
  • Raid != Backup (Score:3, Informative)

    by Alan (347) <arcterex@uf i e s.org> on Wednesday October 29, 2003 @04:55PM (#7341737) Homepage
    Don't forget that RAID only protects you from hardware failures, it doesn't prevent you from doing an "rm -rf important_file" :)

    Personally I have a server with a RAID 5 array that is shared via SAMBA to windows and linux clients, which works fine, though I may adjust this if good suggestions are made here. The only real issue would be disk space, and all my computers now have 120G+ hard drives or RAID array....
  • ghettobackup.bat
    copy c:\porncollection\*.* \\backup1\bak
    copy c:\porncollection\*.* \\backup2\bak
    .
    .
    .
    copy c:\porncollection\*.* \\backup8\bak
  • I don't think the RAID algorithm is the right way to syncronize all your data, when applied on the larger scale. I imagine that what a person really want to do is to unify all his accounts, on slow and fast links all over the world, to look like a huge syncronized partition which stores the data throughout the accounts with sufficient redundancy (meaning something like 'keep copies of all data on at least three different locations). I think using RAID for this would give horrible performance and not be near
  • Don't play around with something "cool" like a distributed RAID disk. Just spend the money on a decent tape drive and tapes, design a tape backup rotation strategy, get a safety deposit box at a local (or not-so-local) bank for off-site storage, and set up Amanda [amanda.org] to do the backups.

  • too much about fire.

    It's my wife and her need to open any email she gets using outlook on her windows box. She's just enough of a geek to be dangerous and "enjoys" the preview feature.

    And she wonders why her 'puter can't log into the LAN without being Virus checked first.

    -Goran
  • Lustre and PVFS (Score:3, Insightful)

    by nagare (249480) on Wednesday October 29, 2003 @05:50PM (#7342212)
    The lustre project (www.lustre.org [lustre.org]) is supposedly going to be the end all/be all of distributed parallel file systems, but I believe it is still fairly unstable and not ready for production use. In the meanwhile, the best one out there is PVFS(www.parl.clemson.edu/pvfs/ [clemson.edu]). Fat chance trying to find Windows clients, but you can always re-export it with Samba.
  • Why? (Score:5, Funny)

    by Illbay (700081) on Wednesday October 29, 2003 @05:57PM (#7342255) Journal
    ...if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself...

    Why would you want to "loose" one of the disks? Don't you know they're supposed to stay tightly enclosed in their little boxes?

    And why do you think that "loosing" the disk would help the image "automatically reconstruct itself?"

    Actually, if you did that the disk would carom around the room like a very fast, very lethal Frisbee and you would be too busy trying to survive to worry about where your data went!

    Just a thought

    Otherwise, your plan sounds peachy.

  • Check out HiveCache (Score:3, Informative)

    by Jim McCoy (3961) on Wednesday October 29, 2003 @06:17PM (#7342393) Homepage
    HiveCache [hivecache.com] is a distributed RAID system similar to what you are asking for, albeit one that is pitched to more of the enterprise backup environment than the home user. Strong security, error-correction and data replication, and multi-source data publiication and retrieval to eliminate the network hotspots that might otherwise occur.


    While a pure linux solution seems to score the most points here, this particular one lets you combine your windows, OS X, and linux systems into a single distributed storage mesh. There is safety in numbers, and the more systems you can add to these sort of distributed storage systems the more reliable they become.


    HiveCache is more of a backup solution, but I do know that it is possible to use this with a webDAV front-end for archival storage and other intersting storage possibilities.

"The hands that help are better far than the lips that pray." -- Robert G. Ingersoll

Working...