Online Backup Solutions? 422
OmnipotentEntity asks: "I'm an IT Manager (and also a lifeguard, don't ask) for a small private club. Recently parts of our server's RAID went bad just as Hurricane Dennis hit, making life a living hell for me and everyone involved. So, I figured perhaps backing up information online would make stuff like this less incredibly painful. A quick browse of Google will show that there are a lot of businesses offering automatic, offsite, online backup solutions. It seems it's becoming a big thing. The largest problem is that they all look alike -- same implementation, similar websites, it looks like someone came through this part of the Internet with a cookie cutter, and by the information available on the website and pricing (which may or may not be available without filling out 100 forms) I can't tell a good company from bad company. I've never had any experience with any of these companies, and I wanted to know if any of you guys had, and if so what were your experiences with them? What are the things to look for? What are the things to avoid? Am I barking up the wrong tree?"
Offsite Co-op? (Score:5, Interesting)
Call me a commie - but why not?
Use gmail. (Score:2, Interesting)
Apple's .Mac offering (Score:2, Interesting)
I have never used it, and its data storage limitations (250MB??) are ridiculously small for the price ($99/yr?), given free email storage upwards of 1GB. However, I was wondering what others' experiences were?
Cheers
Do it the old fashioned way (Score:3, Interesting)
Storage Size? (Score:2, Interesting)
Gmail (Score:2, Interesting)
Good Solution (Score:3, Interesting)
True story: We both run Citrix servers, and one time we had a data loss at my location. Within an hour, we restored our database and application to an extra server at the remote location and used Citrix to connect our users here to the main database. I could then work on restoring from tape, without the pressure of true downtime, just inconvenience time, which I and management can tolerate.
rsync+torrent=backup_cloud (Score:4, Interesting)
I was looking for a free application like that a few weeks ago and found this guy's nice write-up of desired features. [66.102.7.104]
Re:Offsite Co-op? (Score:5, Interesting)
Of course, that sort of mechanism doesn't help if your purpose is to use backups for historical data retention, but then again, if that's your goal, online backup doesn't make sense anyway.
What would be nice would be for this sort of mechanism to be sufficiently simple that an idiot can understand it. You specify the number of unique copies (n) of your data based on how much you care about it. In exchange, you agree to store 2n times as many gigs of information for other people on your drives. That space is reserved in advance at upload time, and freed when you tell the software that the backup of that data is no longer needed.
To prevent abuse, laptops would not be allowed to participate, as the availability of data backed up on someone's laptop is dubious at best. Machines participating must have either a static IP or dynamic DNS (or, ideally, the software could automatically register some sort of free dyndns type name for you).
During the first 72 hours prior to the backup, the machine must respond to at least 75% of hourly requests for confirmation from other machines that have copies of its data. If it does not, it will be assumed to be a laptop and the data stored will be disposed of after 72 hours as space is needed. This means that you can use it if your machine is dying as a temporary backup mechanism, since the data won't go away immediately, but at the same time, will effectively prevent abusing the system by using it to backup people's laptops.
After 72 hours, the confirmation rate will decrease to once per day. A host that has been gone for more than two weeks will be assumed to have been abandoned. However, there should be a mechanism for making one machine double as a stand-in for a dead machine for an arbitrary period of time, so long as it provides enough storage to meet the original machine's obligations.
In addition to confirmation requests from the copyholder, the machine with the original data should attempt (daily) to contact each copyholder to verify that bidirectional connections are possible, thus ensuring that if the data needs to be recovered, it can be.
Obviously, since all data would be encrypted, the encyption key would be stored in a file on system being backed up. This means that you MUST back up if you ever want to recover your data....
Re:Online backup? - Capacity (Score:2, Interesting)
Re:Offsite Co-op? (Score:5, Interesting)
You store my data, I will store yours.
Error-corrected and replicated so that 50% of the cloud could disappear and you would still have 4 or 5 nines of reliability.
Per-file, content-dependant encryption (e.g. every file gets its own AES encryption key)
Free accounts have a 10:1 provided vs. consumed ratio (to cover replication and error-correction bloat, with the ratio expected to drop over time) and people who want to buy a better ratio or even not have to provide space can do so.
Access to data backed-up by any of your systems from any other system you have installed the software on. (No more need to fiddle with system-to-system sync to make sure you have access to all of your files.)
Sound interesting? If so, head over to Allmydata [allmydata.com] and sign up for the beta test. [Windows only at the moment, but OS X and Linux versions will be available in a couple of months...]
UniTrends (Score:1, Interesting)
Re:Offsite Co-op? (Score:1, Interesting)
The first is a problem of availability. What happens when your computer crashes while I'm away for a week, and my computer at home (which holds your backup) is turned off? For this idea to work, there would have to be redundancy. I don't know what the optimal number would be, but assuming 5 copies assures availability, it means that for every gb you backup, you have to store 5 gb for others
The second problem has already been pointed out
To further the example above (5 copies required to ensure that 1 is always available), assume that everyone's connection is the same, and that your download speed is 10x your upload speed. To ensure that you can achieve 40% of your maximum download speed, there now needs to be 20 copies of your data. Now, for every gb you backup, you have to store 20gb of other people's data (compressed to 7-10gb).
Keep in mind that this ensures a minimum of 40%. If more people happen to be online, you could achieve full speed. However, their online activity will also play a role in your recovery speed. I'm assuming that you get their full upload bandwidth for recovery!
Maybe my number of 5 is too high, or maybe it's too low. I don't know. Maybe this type of solution is only good for non-critical data
Re:Backups online (Score:2, Interesting)
I just have to ask. How is this post off topic? Sure, it isn't about online backups specifically, but it is a very reasonable alternative TO online backups.
Some people and their children.
md5 != bit-for-bit integrity (Score:2, Interesting)
Re:You confused backups with availability. (Score:3, Interesting)
312's LeanOnMe did this! Safe AND secure! (Score:2, Interesting)
Re:Offsite Co-op? (Score:3, Interesting)
The traffic should be minimized through the use of what I would call "data affinity". Data from a given source should naturally tend to congregate on the same servers as other data from a given source. This, coupled with internal data checksumming on each host, means that the number of messages should be relatively small even for large amounts of stored data.
It would amount to "verify integrity of all content for host foo.bar.org", followed by a response packet: "x bytes of data, checksum 0x483957483028abcd, no bad chunks detected". When something goes wrong with a chunk, it would have a chunk list attached, and the original server would send replacements for those chunks.
If, during the normal course of operations, a chunk of data didn't checksum correctly, the server would randomly request it from its neighbors and/or the source until it found somebody who was still out there. Each data server should be able to checksum itself fairly easily.
IMHO, the only reasons for periodic queries are A. to make sure a host hasn't gone down permanently (and haning several copies means that it should be safe for detection of this condition to occur over several hours rather than minutes) and B. to prevent laptop users from putting their data out on the network and then going away without contributing back to the community by providing shared storage for everyone else.
Now there is the problem of data integrity if a new copy of data is written out to the distributed filesystem while some copies are not online. Thus, each chunk should be versioned. If a new copy of a chunk is written while a copyserver is offline, the other copyservers should tag this fact, make a new clone of the new data, and periodically try to contact it over a period of time to inform it that the data is no longer needed. After a period of time (say, two weeks), they should give up and clone any additional data that was shared with that copyserver.
Similarly, if a copyserver is brought back online after a crash, it should try to contact the other copyservers and the masterserver and ask if any of its data is still relevant. It should do this periodically, with some eventual timeout (say, two weeks).
Geogrpahic Distributon of Backup (Score:1, Interesting)
http://www.hivecache.com/home.html [hivecache.com]
This allows you to use the excess capacity you already have (believe it or not, having 2+ gigabytes taken up by the same operating system/programs files distributed across all of your desktops falls under the catatory of "excess capacity"). The average corporate desktop has gigabytes upon gigabytes of unused diskspace and oddles of unused cycles (that's what the grid computing fad, in full inflamation about 2 years ago, was all about). Still, it's good to see something actually positive and useful come out of the p2p area.
It has encryption and allows users to self-service themselves with regard to restores.
Re:Backups online (Score:3, Interesting)
http://www.lingnu.com/backup.html [lingnu.com]
You hack one server. One copy of the data gets corrupted. Second copy, however, is on a server that can only initiate outgoing connections. You cannot hack that one from outside. By the time the data gets synced, the hash proves to be wrong, and we know we were hacked. Restore from good backup, and we're done.
Shachar