Forgot your password?
typodupeerror
Data Storage IT

Ask Slashdot: Temporary Backup Pouch? 153

Posted by timothy
from the don't-forget-your-spare-co-backup-pouch dept.
An anonymous reader writes "It looks simple. I've got a laptop and a USB HDD for backups. With rsync, I only move changes to the USB HDD for subsequent backups. I'd like to move these changes to a more portable USB stick when I'm away, then sync again to the USB HDD when I get home. I figured with the normality of the pieces and the situation, there'd be an app for that, but no luck yet. I'm guessing one could make a hardlink parallel-backup on the laptop at the same time as the USB HDD backup. Then use find to detect changes between it and the actual filesystem when it's time to backup to the USB stick. But there would need to be a way to preserve paths, and a way communicate deletions. So how about it? I'm joe-user with Ubuntu. I even use grsync for rsync. After several evenings of trying to figure this out, all I've got is a much better understanding of what hardlinks are and are not. What do the smart kids do? Three common pieces of hardware, and a simple-looking task."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Temporary Backup Pouch?

Comments Filter:
  • by Anonymous Coward

    What do the smart kids do?

    The smart kids don't use linux.

  • Hardlinks (Score:3, Informative)

    by funwithBSD (245349) on Monday May 21, 2012 @01:27AM (#40061565)

    Oh dear.

    Hardlinks don't span storage devices. They are files that share the same inodes on single storage device. Soft links do, but they are pointers to the inode, so "backup" using softlinks and you have a bunch of pointers to data that is on the original system. NOT on the thumb drive!

    Use one of the backup packages out there, you are not at the point of rolling your own.

    Not even close.

    • by partofme (2643183)

      Hardlinks don't span storage devices.

      Except they do, on Windows at least. And you can even mount drive to a folder like c:/otherdrive/

      • by maevius (518697)

        mmmmm no.

        Mounting a drive to a folder has nothing to do with hardlinks (or inodes). This is on a much higher level. In order to span hardlinks on different drives, you should have 1 filesystem on 2 drives, which is not possible.

      • Not correct. You can make symlinks on Windows, but the data doesnt get stored in both locations. Ditto with hardlinks, on windows.

    • by Anonymous Coward on Monday May 21, 2012 @01:42AM (#40061639)

      Nobody pretended this to be the case. Not even the article's author. Please read again - and use a ghostwriter, you are not at the point of rolling your own comments. Not even close.

      • Don't normally respond to cowards, but the author did say he was trying to do backups using "hardlink parallel-backup", which is so much gobbledygook as to make it clear they don't understand UNIX past the "I am dangerous" level.

        Again, keep searching, find a package that can do 90% of what you want.

        Use that.

  • unison-gtk (Score:4, Informative)

    by niftydude (1745144) on Monday May 21, 2012 @01:29AM (#40061575)
    Since you are an ubuntu user, and it looks like you just need a nice rsync front-end to handle backup of the same data to two different drives, I'll suggest unison-gtk.

    Very nice, simple front-end, and will do what I think you need.
    • +1 for unison, it's an awesome little program. I use it on macosx as well, and I believe there's a windows client. I personally prefer the CLI version, but if the GUI version is anywhere near as good, I'll heartily recommend it. What you would do after coming home is to run sync against both external storages. Should work like a charm.
    • Re:unison-gtk (Score:4, Informative)

      by Anonymous Coward on Monday May 21, 2012 @04:50AM (#40062357)

      I think people (including you) don't understand what he needs. He has a complete backup at home. When he's on the move, he wants to backup only modifications that are not already backed up at home, so that the backup fits on a USB stick. To know what has and hasn't changed, he can't access the backup at home, like rsync would need to do. His idea was to have a space-saving complete copy of the backup on his laptop via hard links. You might think that file modification times could be used, but both solutions leave the problem of communicating file deletion. Suppose he needs to recover. He would copy his home backup to the new drive and then he would have to integrate the incremental backup. How would the incremental backup keep the information about deleted files without access to the base backup? I suppose one could keep a recursive directory listing with the incremental backup, but that's the question: Is there a ready-made solution for this?

      • by aaarrrgggh (9205)

        It would seem to make more sense to approach the problem from a different direction: limit your USB backups to the home directory (or a limited set of directories), and do incremental backups there. In the recovery mode, you first recover to the last known good state from the hard drive, and then you apply the changes from the USB stick to selected directories. It could be automated with a script that even I could write, or you just follow a simple step-by-step procedure.

        A fully packaged solution as the O

      • Re:unison-gtk (Score:4, Informative)

        by Hatta (162192) on Monday May 21, 2012 @12:04PM (#40065607) Journal

        To know what has and hasn't changed, he can't access the backup at home, like rsync would need to do.

        If I understand correctly BackupPC [sourceforge.net] caches the checksums rsync generates to enable exactly that. It would be nice if that was possible with vanilla rsync.

    • by cfulton (543949)
      Somebody give the above a +1 (on topic) for actually trying to answer the question!!!
  • Unison? (Score:3, Informative)

    by Anonymous Coward on Monday May 21, 2012 @01:32AM (#40061599)

    I hesitate to offer this, because I've not experimented with it in the precise scenario you describe. However, being another Joe User with ubuntu, I took a look at rsync as a way to implement backups between my home PC and an Apple Time capsule that I was using as a secondary backup device.

    After some tinkering I settled on Unison, which is available in the ubuntu repositories. It's essentially a sophisticated rsync front end, with a few bells and whistles. You get 2-way directory replication between your 'local' and 'remote' file systems [though they could both be local or both remote if you choose] and you can essentially script multiple different backups into the single interface. For example, I have "Office" for documents, spreadsheets and the like, "Photos", for camera images, "Music", and so on.

    Like most tools, Unison is imperfect, but it's simple to use once set up. The key point with it, as with any product you put in this space, will be knowing and keeping track of your definitive data source. If you have a document that exists on both your local and backup systems, and you edit that file separately at each location, then run Unison, only the most chronologically recent copy will be preserved. To go beyond this level of functionality and get to something that can intelligently merge changes, I think you're going to need something more like a CVS tool... There are hugely expensive proprietary solutions (like Livelink), but I've not come across anyone using a good FOSS alternative. HTH...

    • Like the GP, I haven't used Unison in this context, but (a) Unison is easy to configure, and (b) there's plenty of configuration which can be done. I use it for keeping my machines in sync, which, here, would just mean replacing a remote path with a local path. I would definitely invest some time in seeing if this does the trick.

    • by Anonymous Coward

      Unison is NOT essentially an rsync frontend. It uses librsync, sure, but it is also a heavily researched, heavily modified way of backing up and synching files.

      Here is their homepage: http://www.cis.upenn.edu/~bcpierce/unison/index.html

      Unison can work in two ways, seeing if you also put files on the backupdrive that you want to synch back into the main repository. This is an option though, but
      I suspect MANY of the file-backup services like dropbox base their code on Unison.

      I have always been satisfied with

    • After some tinkering I settled on Unison, which is available in the ubuntu repositories. It's essentially a sophisticated rsync front end, with a few bells and whistles.

      It is, in fact, a bit more than that. rsync doesn't handle deletions, so your backup will keep growing in size even though you're not really making any additions (say, you're renaming a big file - now that file is copied twice). unison does, however. This is essential for this use case, especially if one of the backup devices has somewhat limited space on it.

      • Re:Unison? (Score:5, Informative)

        by DrVxD (184537) on Monday May 21, 2012 @03:50AM (#40062137) Homepage Journal

        rsync doesn't handle deletions

        rsync handles deletions just fine - that's why it has a --delete option...

        • Hey cool. But what if you bring your USB stick "backup" to some friends, and then add or modify files on the stick? Unison will let you sync those changes back to your desktop when you get home. Does rsync do that, too?
      • +1 for Unison. It will do everything you need it to, and is easy to use. You can setup your ~/.unison/*.prf files to have multiple roots on the same machine (one per removable drive in your case). Just pick the one you want to use when you sync. It does a better job of intelligently syncing and handling any resulting conflicts than anything else out there, bar none. It handles deletions fine (as does, btw, rsync). Here's a sample default.prf for your scenario:

        root = /home/yourusername
        root = /media/usb

  • by KingAlanI (1270538) on Monday May 21, 2012 @01:34AM (#40061605) Homepage Journal

    I use DirSyncPro to automate my backup tasks. Not sure how to set it up for your particular task, or whether you can, but it might be worth looking into. A lot of options while still being easy to use.

    • by Anonymous Coward

      Thank god its got 'pro' in the name, theres just one thing stopping me getting it now, I'll wait for 'deluxe' to be added to the name, that'll do it.

  • Duplicity, perhaps (Score:4, Informative)

    by Wizarth (785742) on Monday May 21, 2012 @01:35AM (#40061613) Homepage

    Duplicity uses librsync to generate the changeset that rsync would use, then stores the change set. If you stored the change set to the USB drive, this could then be "restored" to the destination drive, perhaps? I don't know if there's any way to do this out of the box, or with a bit of scripting, or if this would need to be a whole new toolchain.

    • by Weezul (52464)

      I suspect duplicity and git-annex are the only correct answers in this thread because the underlying problem is that your rsync like tool must detect the changes using a listing of file hashes, not access to the files themselves. It's the same problem as doing incremental backups to a host not running specialized incremental backup software. Duplicity does this, but rsync does not afaik.

  • not rsync (Score:2, Interesting)

    by Anonymous Coward

    there are better solutions than rsync.
    rdiff-backup
    dupicity
    for example.

    i probably don't understand what you are trying to accomplish.

  • dump(8) (Score:2, Interesting)

    by Anonymous Coward

    this is what dump(8) does

    • by kiite (1700846)

      Unfortunately, dump requires a supported filesystem. But most people forget about the incremental backup features of regular old tar(1).

  • by colonel (4464) on Monday May 21, 2012 @02:06AM (#40061729) Homepage

    First, ignore the people who encourage you not to try, and who point you in other directions. Sure, there are much better ways of doing this, but who cares? The whole point is that you should be able to do whatever you want -- and actually doing this is going to leave you _so_ much smarter, trust me.

    Some douche criticized you for not knowing beforehand why hard links wouldn't work. . . . because, you know, you should have been born knowing everything about filesystems. To hell with him, sally forth on your journey of discovery, this can be hella fun and you'll get an awesome feeling of accomplishment.

    First off, you're going to have trouble using rsync with the flash drive, because I assume your constraint is that you can't fit everything on the flash drive, it's only big enough to hold the differences.

    Next, come to terms with the fact that you'll need to do some shell scripting. Maybe more than just some, maybe a lot, but you can do it.

    I'd recommend cutting your hard drive in two -- through partitions or whatever -- to make sure that "system" is fully segmented from "data." No sense wasting all your time and effort getting backups of /proc/ and /dev/, or, hell, even /bin/ and /usr/. Those things aren't supposed to change all that much, so get your backups of /home/ and /var/ and /etc/ working first. Running system updates on the road is rarely worth it, and will be the least of your concerns if you end up needing to recover.

    Next, remind yourself how rsync was originally intended to work at a high level. It takes checksums of chunks of files to see which chunks have changed, and only transfers the changed chunks over the wire in order to minimize network use. Only over time did it evolve to take on more tasks -- but you're not using it for its intended purpose to begin with, since you're not using any network here. So rsync might not have to be your solution while travelling unless you start rsyncing to a personal cloud or something -- but its first principles are definitely a help as you come up with your own design.

    The premise is that, while travelling, you need to know exactly what files have changed since your last full backup, and you need to store those changes on the flash drive so that you can apply the changes to a system restored from the full backup you left at home. You won't be able to do a full restore while in the field, and you won't be able to roll back mistakes made without going home, but I don't think either of those constraints would surprise you too much, you likely came to terms with them already.

    So, when doing the full backup at home, also store a full path/file listing with file timestamps and MD5 or CRC or TLA checksums either on your laptop or on the flash disk, preferably both.

    Then, when running a "backup" in the field, have your shell script generate that same report again, and compare it against the report you made with the last full backup. If the script detects a new file, it should copy that file to the flash disk. If the script detects a changed timestamp, or a changed checksum, it should also copy over the file. When storing files on the flash disk, the script should create directories as necessary to preserve paths of changed/new files.

    For bonus points, if the script detects a deleted file, it should add it to a list of files to be deleted. For extra bonus points, it should store file permissions and ownerships in its logfiles as replayable commands.

    The script would do a terrible job at being "efficient" for renamed files, but same is true for rsync, so whatevs.

    I built a very similar set of scripts for managing VMWare master disk images and diff files about ten years ago, and it took me two 7hr days of scripting/testing/documenting -- this should be a similar effort for a 10-yr-younger me. I learned *so* much in doing that back then that I'm jealous of the fun that you'll have in doing this.

    Of course, document the hell out of your work. Post it on sourceforge or something, GPL it, put it on your resume.

    • by colonel (4464) on Monday May 21, 2012 @02:11AM (#40061753) Homepage

      Forgot to mention:

      To accomplish this, you'll need to read up on:
      - bash
      - find
      - grep
      - awk
      - sed
      - md5sum
      - chmod/chown
      - mkdir -p
      - diff/patch (for general reference, and also look up binary diffing tools)

      Extra extra extra bonus points if you compress the changed files when storing them on the flash drive.

      • by e70838 (976799)
        Or learn Perl. Perl can do easily the same things as bash, find, grep, awk, sed, ...

        Avoiding to learn all the intricacy of all these tools was one of the main purposes of Perl.
        • by Anonymous Coward

          Back in 1993 when I was new to *NIX, I asked a seasoned sysadmin which scripting language I should learn. He started listing all those same tools .... then said - "or you could just learn perl."

          I did and you sir are 100% correct. I know grep, find, ksh/sh, sed very well - my awk is extremely weak. Perl hasn't let me down all these years. It is still my go-to scripting language whether I'm hacking some crap code together OR performing a system design and pushing out a beautiful website with XML or json se

          • by colonel (4464)

            Or you can learn all the utilities if you want to learn right and be a better sysadmin and work with other sysadmins.

            Perl is for programmers that like to pretend they know how to be sysadmins and don't need to share their tools with other sysadmins.

      • by DrVxD (184537)

        Extra extra extra bonus points if you compress the changed files when storing them on the flash drive.

        Hint for the bonus question: gzip ;)

        • by chelberg (1712998)

          bzip is better than gzip if space is at a premium. There are even multi-core versions of bzip2 that are very efficient. You could also look at p7zip. If you want a really efficient compressor, try nanozip as well, although its page says it is still experimental, but it seems to be at the top of several compression benchmarks.

    • You can get started on fire.

    • Or he could save himself a ton of grief and just use rdiff-backup [nongnu.org], which happens to use librsync, produces incremental differential backups, stores said backups as files you can simply browse, works equally well on local and remote filesystems, and is dead simple to use. I've used it for years now on a ton of systems.
      • by hankwang (413283)

        Or he could save himself a ton of grief and just use rdiff-backup,

        Interesting, since I used rdiff-backup in the past and found it a pain. If files are stored as diffs of diffs of diffs of diffs of a full copy, it is rather easy to corrupt the backup. These days, I make backups using rsync, with

        rsync -aOi --delete --modify-window 1 --link-target /mnt/backup/home-20120519 /home /mnt/backup/home-20120520

        For the first backup, omit the --link-target argument. Only modified files are stored. As long as you don

        • by hankwang (413283)

          Replying to myself: of course, I realize that the OP cannot use a hard-link backup if the usb drive cannot hold all his important data. It's too long ago that I used rdiff-backup; can you reliably split the master backup and the differential backups to different filesystems (say the drive at home and the usb stick)? Preferably without risking corrupted backups if it involves manually merging diff trees.

    • by hankwang (413283)
      If you go for a system where the files are stored in a Unix-like filesystem (case-sensitive filenames, permissions), what kind of filesystem would you need to use on the USB stick? I believe that the wear-leveling system on USB sticks and flash cards is optimized for FAT filesystems (with the file allocation table right at the beginning). I think that a journalling filesystem would be a good idea on a flash drive, which leaves you with ext2 (with noatime) and very long filesystem checks every time you acci
      • by colonel (4464)

        If you want to "copy" junk with its metajunk from an ext3 filesystem on to a FAT32 filesystem, remember that you can always create an 8GB file with dd from /dev/zero, run mkfs.ext3 against that file, and then mount that file as an ext3 filesystem thanks to the loopback adapter. You won't be able to read that junk from a Windows machine, but you probably won't care, and if you create an 8GB file on a 16GB FAT32 flash disk, you'll still have 8GB of space available for use in Windows -- and Windows will be ab

  • It seems like the poster confuses two tasks: Backup and version control.

    For the former, use archiving tools to perform full and incremental backup. How is it done? You could use find to list files with certain criteria, e.g. last modified timestamps. Pass that list to using the -T flag, where you also use -X to exclude files and directories like "*/.thumbnails" and "*/.[cC]ache*". Once the tar is done, use your favourite checksum tool; md5sum, shasum to store a checksum of the archive in a separate file.
  • dar? (Score:3, Informative)

    by safetyinnumbers (1770570) on Monday May 21, 2012 @02:09AM (#40061741)
    If I understand your problem right, How about dar? It can make an empty archive of your main backup to act as a reference (just file info, no files). Then it makes archives relative to that, with just changed files. It can then apply the changes to the original dir, including deletions, if you need that.
    • by jchevali (171711)

      I agree, dar is definitely the way to go. You need to learn how it works but once you do it's incredible all the things you can do. What safetyinnumbers is referring to is called an isolated catalogue. See also: dar_manager.

  • They'd have the skinny on pouches for sure.

  • by Tim99 (984437) on Monday May 21, 2012 @02:53AM (#40061921)
    I'm probably being dim here, but why don't you just rsync from the USB HDD to the USB stick? You can filter by date using "find" something like:
    rsync -avt `find /mnt/usbhdd/ -name "*" -type f -mtime -7` /mnt/usbackup
    This finds/filters files updated/created in the last 7 days.

    To get the stuff back from the USB stick that you have created or modifed while you are away to the USB HDD you would just do a normal rsync which will only overwrite/add files that you have modified or created since you ran rsync from the USB HDD.

    I am not familiar with Grsync, but it looks as though you can run it in simulation mode to see the output - You could then copy/paste the output and modify it with the 'find whatever -name -type f mtime -DAYS' and run it from the prompt.

  • I don't know if I'm one of the "smart kids" or not, but I'm a standard non-technical user and have found LuckyBackup [wikipedia.org] or BackInTime [le-web.org] run along with an online sync/backup service like DropBox or SpiderOak the most handy options.

    Both LuckyBackup & BackInTime are GUI tools that set up rsync rules (even complicated ones) with an easy point-and-click interface, then schedule them in cron. They can do anything rsync can: synchronize the drives so the backup matches the current, or make a backup of everything p

  • From what I have parsed the OP wants to have a full back-up on a USB-HDD and the diffs on the USB-Flash, because the Flash is limited in size.
    Just write two rsync (or grsync) scenarios: one for HDD and the other for the Flash. On the HDD you will have a directory that is a mirror copy of your laptop. On the Flash you will keep the diffs for the time between syncs to the HDD.

    When at home
    1. rsync your laptop to the HDD (mirror).
    2. copy the incremental stuff from the Flash to a separate directory (e.g. diff-20

  • unison has already been suggested multiple times.

    I used unison. It's perfect to sync from A to B (it only syncs the diffs) then modify B and later sync B to A
    You also can modify A and B at the same time as long as it's not the same file, then sync and then A and B are identical.
    You can even sync in cycles: A->B->C->A with modifications on all three directory trees and it still works
    Unison also handles deletions on both sides fine.
    Hint: use the -group -owner -times flags

  • The first problem to consider is how you determine which files to backup. Filesystems like xfs, zfs, and btrfs have nice convenient ways to get a list of changed files (and for xfs and zfs, the contents of those files as well). For ext2/3/4 (and other older unixy filesystems) look at "dump". And of course, if you're working with a completely dumb filesystem, you can always use rsync (if your backup disk is remotely accessible) or some external/manual indexing to figure out what files to backup.

    If your
  • by gatzke (2977)

    For years I have used rsync scripts.

    My problem was syncing a desktop and a laptop. So I made upload_to and download_from scripts to sync as needed.

    I also try to keep a third master backup copy on a different server so all three are synced.

    One problem comes when trying to work on both desktop and laptop simultaneously. Just map a drive and modify files on one side.

  • by phorwich (909601)
    I think git has got what you seek. http://git-scm.com/ [git-scm.com]
  • In the grand internet tradition of answering a loosely related question which is no use at all to the asker, I will say that the "smart kids" might use something like ZFS, which almost handles this for you. (Take snapshots, save delta streams on your USB stick. Requires the backup to be a ZFS copy, not just the same files.)

    Useless right now at least. But I've been pretty happy with switching my storage to ZFS, even if the Linux version sucks. (I mostly don't use the Linux version.) I'd recommend it to anyon

    • by MightyYar (622222)

      That's a terrific idea, and it would be a much cleaner and more reliable solution than some kind of cobbled script. Maybe he could make his working directory on Linux a ZFS volume using FUSE. Snapshot it before you go on the road, export the snapshot to the external HDD, and then periodically export the snapshot delta to the USB as a backup. When you get home, re-snapshot to the external HDD and delete the delta on the USB. The snapshot creation/destruction and export could (and probably should) be automate

  • Easiest solution is probably to use tar's incremental backups. The -g argument creates a relatively small file listing the files already backed up, so future incremental runs can skip them. If you keep the incremental files on the laptop then you can put each days actual tar backup on whatever devices you have handy.

  • and I know this is not what he asked for, but wouldn't the simplest solution be to purchase a second external drive (maybe an SSD for durability) and actually have a complete backup on the road... Or even just take his current external with him - he has it backed up in the cloud any way...

    I ask because he never stated why that external drive was stuck at home..

    If that won't do, another possible solution.

    1) I don't see a need to sync the USB stick when he returns - just perform your usual backup when you ret

  • Just do a clone with Git. You can track changes, deletions and it can resolve conflicts easily.
  • Thete are multiple "watch" apps out there in various languages that will run a script every time a directory changes.

    Google "watching files with ruby"

    Substitute ruby for python or perl or...

  • Someone had suggested using Git and I was going to suggest the same. If you are only backing up documents then it should be easy enough to create repos on the USB HDD, Laptop and USB drive. You can then commit/merge changes between repos to keep in sync, perhaps use some shell scripts to ease administration. Also, I use a product called Super Flexible File Synchronizer to sync a subfolder on my laptop's filesystem with a WebDAV server. It's got lots of features and supports Linux, Windows and Mac. http [superflexible.com]
  • Since you can't use dump(8) as others have pointed out, maybe you can do something with UnionFS. After you do your full backup to USB HDD and are about to leave on a trip, mount a unionfs over top of your critical filesystem(s)... Then every day, copy the union layer off to thumb-drive.

  • I know people hate it, but setting up an auto-running batch script for backup upon plug-in. I've had no issues doing this from Windows to Linux.

  • find . -newer last_backup_timestamp | cpio -o snapshot$(date +%Y%m%d) && touch last_backup_timestamp

  • Make full "level 0" dumps at home to the big disk. Make delta "level 1" or incrementing levels dumps to the flash drive. Each level will back up everything that changed since the previous level.

    Now, you want the backups at home to be a file copy - there's no reason you can't do that and then do level 1 dump backups on the road - ie, never actually make a level 0. Just do your rsync and then update /etc/dumpdates to reflect your rsync-level-0.

    This is exactly what dump was designed to do, and it's going to

Those who can, do; those who can't, simulate.

Working...