Slashdot Log In
Easy, Reliable Distributed Storage and Backup?
Posted by
Soulskill
on Sat Oct 04, 2008 04:14 AM
from the grandma-needs-those-pictures-of-her-cat dept.
from the grandma-needs-those-pictures-of-her-cat dept.
RichiH writes "Most of you are the free IT staff of friends and family, just as I am. One of my largest headaches is backing up their data. What I am looking for allows for off-site storage on multiple server machines running Linux, has Linux & Windows clients that Just Work and require zero everyday effort (although a large-ish effort to set them up is just fine), allows for granular access control, is versioned and will, ideally, allow me to grab data automagically (think photo pool for your family where your mother, sister, etc., share each other's photos). This is something I've been trying to find for years, but I've never seen anything even closely resembling what I want. With the Wall Street Journal handing out its Technology Innovation Award to Cleversafe recently, I was once again reminded of this particular itch which needs scratching. Before I deploy it, I want to ask the Slashdot community for its opinion on that piece of software, and on potential alternatives. How do you solve this problem?"
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Git... (Score:2)
Git and the git-web web based tool are very useful for maintaining a tree of archived data, and browsing it.
Re: (Score:3, Informative)
Re: (Score:2)
Yep. I have something similar at home too. Works well.
Re: (Score:3, Interesting)
actually, for my own digital assets repo - see signature - i see two features of git which might be handy, atomicity of commits and hashes which avoid storing duplicates. git has "plumbing" commands which might help. Still haven't explored it.
BTW if you have enough band you could do away with a doxroom instance on a host, don't forget to backup files and db and remember it's alpha quality.
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
Re: (Score:3, Interesting)
Use the pr0n method! (Score:4, Funny)
Rename your data to 'Barely legal college girls having first time sex - XXX Vol1/256.r001' and use p2p to spread them all over the world!
Re: (Score:2)
What an excellent way to backup my photo collection! I'll get to work on it right away.
Re:Use the pr0n method! (Score:4, Funny)
It's funny , but it might be practical in way.
It's possible to put data in images , so why not in a video.
You just take something highly demanded ( could be porn , could be a movie ) , and punt your data in it , well encrypted and without anyone knowing it.
The file gets shared because of the content people want to see , and if you ever lose your data you just lookup the file via P2P , and you have it back.
Parent
Re: (Score:3, Funny)
I can tell you how I solve it in a business (Score:5, Informative)
I can tell you how I solve it in a business context, but whether or not it could be scaled down to personal I'm not sure.
The problem: 2 sites each with 70-100GB of data needs offsite backup with similar criteria to your own. Bandwidth available to these sites is 2-4Mbps. The only OS involved is Linux, though I'm sure Windows could be shoehorned in somehow. A third site which has a tape streamer and someone to take tapes offsite is available. Data protection legislation means that storing it with a hosted service is illegal unless I encrypt it myself before sending it offsite - I'm only aware of one tool which claims to be able to do this and still send data as a binary delta (it uses the rsync library) and that tool is still not particularly common in Linux distributions and not very widely used. I'm nervous of trusting my backups to a tool that isn't on heavy use, particularly if strong encryption is being employed.
The Solution: A server in the third site and some judicious scripting with rsync allows it to mirror the data in the other two sites. The first sync is fairly painful, of course, but provided you don't have too much data regularly changing subsequent syncs aren't too bad. The server is backed up to tape which provides versioning capability so if someone only realises that they lost a file a week after the fact it can still be restored,
Initial effort to set up was pretty great but now it's done it JFW and requires no brain power whatsoever to run on a daily basis. I can make the data available over the VPN (of course the access speed will be dog slow) more-or-less immediately and I can make it available at LAN speed by copying it to a hard disk and courier it to the remote office in under 48 hours. A full restore of 100GB across a 2Mbps connection will take at least 4-5 days.
Silver Bullet for file ownership/ACLs? (Score:3, Insightful)
I've got a few ideas about doing it, but they're all kludgy or force me to walk away from my rsync scripts which are really fairly mature at this point. Furthermore, I need to get deltas downstream and packing everything in to one file pretty much defeats that purpose at the several gig level unless I'm running an rsync server to calculate the diffs. Th
Re: (Score:3, Informative)
Recent versions of rsync fully support POSIX ACLs (including, if asked, setting up ACLs on the receiving end that don't make any sense because they refer to uids that don't exist - though you could work around that one with a common authentication mechanism such as LDAP) - I've not tried to get Windows working so I'm not sure how well that would work.
Be warned that full POSIX ACL support hasn't made it into every Linux distribution yet - IIRC Debian Etch's rsync doesn't, for instance. If you're paranoid,
Re: (Score:3, Interesting)
Two questions (Score:4, Informative)
You're asking two questions. The first is that you want backup, so that all their data just gets thrown somewhere and they lose the last few days' work their hard drive dies. You don't even necessarily want this on the network; just back up to a DVD-R every so often, and take every month's DVD-R offsite (a friend's house, a bank's vault, whatever). There's lots of backup software for this. Most can do fancy stuff like incremental backups. You can probably find something opensource you can host for your friends and family on a decently-available server.
The second question is networked file storage, where you don't care about automatically archiving files, but you do want frequent access and a good UI. For this I recommend something like Dropbox [getdropbox.com], which has good support for OS integration and a web interface.
Re: (Score:3, Informative)
Re:Two questions (Score:5, Insightful)
If you try to roll backup and distributed file-storage into the same application, you're not going to get anything useful. Aunt Sally is going to want every single file including her OS and her tax returns backed up, in case her hard drive dies, but only wants the photos -- and only some of the photos, actually -- to be visible to Grandma Suzie. If Suzie can see every file on Sally's computer, and the entire history of each file, she's not going to be able to browse the photos in a way that's at all intuitive.
And worse yet, if Sally wants to send out links to her photos to fifteen of her friends by e-mail, she needs some sort of interface to mark parts of her backup as world-readable but the rest (like her passwords and e-mail) not. If the network backup program even lets you do this, it won't give Sally a UI that she'll be able to figure out.
You can certainly get network backup services: Mozy was mentioned in an earlier comment.
If you rethink your requirements in terms of your goals, you'll probably find that both rolled into one isn't what you want, and not just because a product doesn't exist at the moment that does that — a product that does that can't possibly have a good UI. If they shouldn't notice or care about how backups are being made, how are they going to figure out how to share photos with each other?
Parent
Re: (Score:2)
You need Mozy for the backup. You need Dropbox for the sharing. Unless you simply want to alias all of their files and folders into the Dropbox,
Re: (Score:2)
Forget about portable hard drives for anything other than temporary storage. Enough dust will kill them off if nothing else.
Dropbox (Score:5, Informative)
Re:Dropbox (Score:5, Insightful)
Dropbox is absolutely fantastic as a sync tool (and also has some degree of versioning), but there's no practical way as of yet to make it into a full-system backup. When 'watch folders' show up, it'll get a lot closer, but like any web-based system, it becomes impractically slow for anyone dealing with lot of data. Even digital snapshots add up quickly with the resolution of the point-and-shoot cameras, never mind if there's an actual photographer shooting RAW.
Parent
online backup (Score:2, Interesting)
JungleDisk with Amazon S3 Storage (Score:3, Informative)
Re: (Score:3, Informative)
As far as redundancy goes, your data gets stored in multiple Amazon datacenters around the country, which provides redundancy and high availability. At the end of the day, it's a far superior solution to anything you can cook up at home.
Of course ther
Re:JungleDisk with Amazon S3 Storage (Score:4, Funny)
Wow. How far below ground is your house?
Parent
wimps (Score:5, Funny)
"Only wimps use backup. Real men just upload their important stuff on ftp, and let the rest of the world mirror it."
God
Re: (Score:2, Informative)
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -Linus Torvalds
Update from OP (Score:2)
you haven't thought this through (Score:5, Informative)
Backup isn't the same as sharing. And do you want actual replication or merely fault tolerance to node failure? Actual n-fold replication means you're going to pay n times the amount of money for storage. And why do you insist on one application to do everything?
My suggestion: set up automatic backups to one of the many backup services on the net. They worry about how to replicate your data, you don't have to. For the same service to support both backup and sharing is hard and it's probably a bad idea. It's much easier if you know that the backup service simply cannot access the contents of any of your files.
For sharing, use services designed for that: Flickr Pro, Picasa, Google Docs, whatever. They are designed for sharing, they know about users and permissions, and they can only publish what you actually upload to them.
As for Cleversafe, the idea is as old as forward error correction, but the economics and management never seem to quite work out. And basically, you're getting the same functionality from hosted storage: Amazon, Google, Box.NET, etc. are already figuring out how to keep your data available and secure, and are probably doing a better job than you could do with a homebrew system.
Re:you haven't thought this through (Score:5, Insightful)
I want distributed backups with several, for lack of a better word, working copies checked out on different machines.
Aha, now I figured out why we're all misunderstanding you. Those aren't backups. "Backups" to my ears means that you copy the entire contents of your disk or your Documents folder nightly onto tape or some other archival medium, so that in case of hardware failure you have something to restore from. Potentially you also keep prior versions around. The tapes are stored in a corner somewhere because they're never actually accessed except in an emergency, and they're destroyed after a few months.
What you want isn't backups, since it doesn't make sense for different people to share backups any more than it makes sense for different people to share a single networked hard disk or networked home directory. You just want a distributed file storage system, with automatic syncing / commits.
Parent
Re: (Score:2)
wait so what you want is concurrent versioning to be dealt with by some sort of system?
if you don't want user to have to learn about the subversion controls (because it can be a real GIT to use sometimes), then many programs implement similar functionality using plugins.
e.g
Web2.0 OpenOffice.org collaboration & document management extension [openoffice.org]
OOoSVN
I would setup an SVN/cvs/git for everything then find extension tools to deal with each use case as the need for transparent svn access becomes apparent
rdiff-backup and chironfs (Score:4, Informative)
The subject says it all:
- rdiff-backup to backup your data one backup server.
- chironfs to clone the file system to another remote server.
rdiff-backup runs on *nix and windows (with the help of Cygwin).
Once set up, rdiff-backup needs virtually no maintenance. If needed, setup Nagios to warn you if things run afoul.
Used this for years, never disappointed me so far!
If you had Windows & Mac - Mozy (Score:2)
If you had only Windows and Mac, I'd opt for Mozy (http://www.mozy.com) which is owned by EMC. It's $50/year for unlimited storage and their agent is unobtrusive and backs up even open files.
The downside is that it limits upstream bandwidth to 1Mb/s, so your initial backup might take a week. But after that, it takes 3 minutes a night and it does it without prompting. I've strong-armed my immediate family into using it because it also allows me to monitor remotely the status of all backups.
It's seriously
Use rsnapshot (Score:2, Interesting)
Get 4 x 1TB disk and minimum RAID 6. Install Linux. Install rsnapshot [rsnapshot.org], which offers:
* Filesystem snapshot - for local or remote systems.
* Database backup - MySQL backup
* Secure - Traffic between remote backup server is always encrypted using openssh
* Full backup - plus incrementals
* Easy to restore - Files can restored by the users who own them, without the root user getting involved.
* Automated backup - Runs in background via cron.
* Bandwidth friendly - rsync used to save bandwidth
You may also find CentOS [cyberciti.biz]
Bacula? (Score:5, Informative)
http://www.bacula.org/ [bacula.org]
Runs pretty tight (low bandwidth), supports channel encryption and datastore encryption, can even create Bare Metal Recovery disks. I have a server room with LTO3 tape drives that I use to backup my clients' incremental data changes nightly, including Linux, Mac and Windows clients and servers. I have VPN's out to each client, so don't use the built-in channel encryption, but I maintain a keypair for each client.
Backup only, but I /could/ present a maintained volume as a share over the VPN. Bacula supports disk and tape volumes as backup stores. I've personally had no need to do that to date.
We're not talking terabytes here - my ISP would pwn me if that was going on, but I do circa 20G of data changes every night from clients. Some of them are laptops that are not always on or connected. Most are friends and family PC's, so it backs up when it can. I have to do almost no maintenance apart from changing a tape occasionally. The backup client is tiny and unobtrusive, even when running. On Windows it uses VSS, so it is reliable.
I have had a number of panic phone calls (esp from my kids at Uni) who have lost a thesis or the like and are utterly amazed when, after a few clicks over the phone they look at their webmail and yesterday's version is in their inbox. That's what it's all about! I am the god of lost data! Which, of course, works for me.
Re:Bacula? (Score:4, Funny)
- plural of baculum.
Do you know what a baculum is? It is the penis bone found in most male mammals with the exception of humans.
Great product naming!
Parent
Amazon S3 (Score:2)
There are a bunch of people offering this sort of service (or build your own) on Amazon's S3. It has the advantage of being accessible to everyone, has the security built in and you only have to worry about the data not server availability.
Backup not on the cloud just doesn't make much sense to me these days.
If I might plug a favorite project! (Score:3, Informative)
AFS? (Score:5, Interesting)
As well as all of the standard things you'd expect from a networked filesystem (ACLs, authentication, and so on).
If you set up an AFS cell with your volumes replicated across a few remote servers and get your clients to connect to this cell then it should be fine. Set a cron job to take regular snapshots, and dump them to some offline medium periodically.
Linux-based NAS with built-in applications (Score:2)
I found and tested the predecessor of the following device (which I can recommend on basis of a year-long test of a sample with N=1): Bubba (see http://excito.com/bubba/about-bubba.html [excito.com] ). A Swedish NAS device. I have to note that it's certainly not "distributed" in the sense that it's easy to mirror data across multiple d
rsync and dyndns (Score:2)
First set each computer up with a dyndns account so that remote administration is easy.
Then set up folders in each computer for each member of the family. For each family member's main computer, make symbolic links to other family members picture folder, etc.
Set up a schedule to use rsync to copy the contents of the folders on a daily basis.
While you are at it, I suggest adding one more computer to the mix that will copy the home folders for all family members and keep them in a svn folder so they can call
BackupPC (Score:2)
BackupPC might do what you're after. From the blurb:
high-performance, enterprise-grade system for backing up PCs
BackupPC is disk based and not tape based. This particularity allows
features not found in any other backup solution:
* Clever pooling scheme minimizes disk storage and disk I/O.
Identical files across multiple backups of the same or different PC are
stored only once (using hard links), resulting in substantial savings
in disk st
www.jungledisk.com (Score:2)
Our Stuff... a bit of a rambling post (Score:4, Insightful)
Wuala (Score:3, Interesting)
I'm surprised no one has mentioned Wuala - www.wua.la - which is a distributed online storage system. You agree to store (encrypted) bits of others' files in exchange for the ability to do so on others' machines across the wuala network. It's free and pretty damn cool. They can explain it better than I can: http://wua.la/en/learn/why [wua.la]
Great technology, not so great product (Score:4, Insightful)
I watched their CTO's Google Talks presentation and it was really interesting. I got all excited, joined their beta only to realize that they - IMO - misused the technology they had and designed a rather mediocre product. Wuala wants to be a backup tool, a sharing tool, a social networking medium as well as few other things. In other words it lacks focus and wants to do everything - an approach that rarely works.
Parent
carbonite (Score:3, Informative)
Re:Mozy? Duplicity? (Score:5, Insightful)
No Linux client, AFAIK (though I do run it on my MBP). It's become rather impractical for me as a photographer though, as sometimes I'll shoot enough photos that my internet connection would be completely maxed out for days on end trying to sync up the new data - and I have a decent-for-cable 1Mbps upload rate.
rsync to Amazon S3 might be an option, if only for cross-platform capabilities. No versioning though, but outside of Apple's Time Machine (obviously useless for Windows and Linux), you're not going to get that without some major headache. Any remote system is going to be horribly slow for the first sync with any typical internet connection, and quite possibly problematically slow for photographers, media horaders, and in general people with big hard drives.
Parent
Re: (Score:2, Informative)
cough ... JungleDisk [jungledisk.com] ... cough