Forgot your password?
typodupeerror
Data Storage Software

Recoverable File Archiving with Free Software? 80

Posted by Cliff
from the redundancy-for-your-files dept.
Viqsi asks: "Back in my Win32 days, I was a very frequent user of RAR archives. I've had them get hit by partial hardware failures and still be recoverable, so I've always liked them, but they're completely non-Free, and the mini-RMS in my brain tells me this could be a problem for long-term archival. The closest free equivalent I can find is .tar.bz2, and while bzip2 has some recovery ability, tar is (as far as I have ever been able to tell) incapable of recovering anything past the damaged point, which is unacceptable for my purposes. I've recently had to pick up a copy of RAR for Linux to dig into one of those old archives, so this question's come back up for me again, and I still haven't found anything. Does anyone know of a file archive type that can recover from this kind of damage?"
This discussion has been archived. No new comments can be posted.

Recoverable File Archiving with Free Software?

Comments Filter:
  • where have you been? (Score:3, Informative)

    by Anonymous Coward on Wednesday February 25, 2004 @12:51AM (#8382938)
    ever heard of parity archives?
  • wow man (Score:3, Funny)

    by Anonymous Coward on Wednesday February 25, 2004 @12:52AM (#8382953)

    the mini-RMS in my brain

    You really ought to have that looked at..

    • Re:wow man (Score:5, Funny)

      by Viqsi (534904) <jrhunter@menagerie . t f> on Wednesday February 25, 2004 @02:42AM (#8383557)
      Y'know, I would've done that a long time ago, but my health care provider doesn't cover ideologuectomies. They claim that it doesn't threaten your physical life, just your social one. The bastards.

      :D
      • ...my health care provider doesn't cover ideologuectomies. They claim that it doesn't threaten your physical life, just your social one...

        Ah.... I've met RMS a number of times, and having once made the mistake of standing downwind, am familiar with this problem. Try Soap and Water, augmented with a long-handled stiff brush. Pardon the Pun, but Lather, Rinse, Repeat. Works wonders for your complexion, with immediate secondary positive effects on the `ol social life...

    • I suggest vrms [debian.org] as a healthier replacement.
  • Are you sure it's unacceptable that tar archives are breakable? The way I see it, you'll tar your files then bzip them and finally put them on a backup server/CD/DVD. The bzip layer will provide the auto-repairing features, I don't see how it could break between having the tar and bzipping it. Is this for a normal environment? If your harddrive breaks during or after creating the tar, then the bzip would fail, no? Please tell us more about your situation if not.
    • by wiswaud (22478) <esj&wwd,ca> on Wednesday February 25, 2004 @01:49AM (#8383316) Homepage
      if you make a big tar then bzip2-it, then store the file on a CD.
      then 2 years later you want the data back.
      there's a read-error at some point within the .tar.bz2, and it gives you some garbage data.
      bunzip2 will actually be able to recover all other 900kB chunks of the original tar file, except for this missing chunk or part of it.
      Tar will just choke at that point and you lost everything past the read error. bunzip2 was able to recover the data past the error, but tar can't use the data.
      It's quite frustrating.
      • Tar will just choke at that point and you lost everything past the read error. bunzip2 was able to recover the data past the error, but tar can't use the data.

        I've been there with .tar.gz and now .cpio.bz2 is my archiving technique of choice, with a block size of 100k you get (slightly) less compression but (slightly) more resilience.
      • I've had a backup of my hard-drive on another drive, in tar.gz form.
        Ofcourse, when the big day came, and my hard drive broke, it turned out the other drive had bad sectors!
        First, a comment: never ever ever ever use tar.gz to back up anything you'd like to have back.
        You can recover stuff easily from tar past the break point - files in tar are basically concatenated together. So you miss the rest of the current file, but you can find the next header+file easily.
        But gzip does not byte-align its data! That's, i
        • So what's the option? Use RAR for archiving your data? I would copy files straight to a burn if only the CD filesystem didn't have a piss-weak depth limit. :-/
          • That is actually exactly what I do - full, uncompressed backup - takes 5-6 CDs in my case. Then I do incremental backups of the changed files till that goes over 1 CD (level 1 & 2). Redundancy comes from having each file on several backups...
            But rar would be better if it was more widely distributed and free (speech) - then i'd save a copy of the decompressor and its source code on every CD...
            • How do you get around the directory depth issue? I discovered that I can't add anything to an ISO beyond 4-5(?) levels of directories.
              • You are right, it seems that iso9660 can not handle more than a depth of 7 (not including the root). I never encountered it till now - my home dir must not be very deep - I encountered problems with long file names instead.
                It seems that the rock-ridge extension deals with this by putting deeper directories into a dir called RR_MOVED, so I think if you use mkisofs with rock ridge all is fine (which could be another reason that I didn't encounter the depth limit)
                • Aha! Okay. I just heard that Rock Ridge couldn't be read on some systems, but if that's the only problem then I'll use it regardless. I have way too many Java source packages in CVS which are making the depth pretty big. :-)
  • Try apio (Score:5, Informative)

    by innosent (618233) <jmdority@gma[ ]com ['il.' in gap]> on Wednesday February 25, 2004 @01:18AM (#8383125)
    There used to be a cpio-like archiver called apio, that was designed for those types of situations. Of course, that might not be much help for non-unix systems (unless you plan on running in Cygwin), but I remember having great success with it for the old QIC tapes, which were in my experience the worst backup medium for important data ever (better to have no backup than think you have a good one, but have a dead tape)
    • Re:Try apio (Score:5, Informative)

      by innosent (618233) <jmdority@gma[ ]com ['il.' in gap]> on Wednesday February 25, 2004 @01:55AM (#8383347)
      Sorry, I believe it was afio [freshmeat.net]
      • Re:Try apio (Score:1, Funny)

        by Anonymous Coward
        hey no fair, double mod points for your typo!
      • --I experimented with afio a couple of years ago, but the problem I ran into was twofold:

        o Passing a list of files to afio, by default, is a PITA; I much prefer tar / zip / rar for their convenience here, you can pass a wildcard at the command line *or* send them a list.

        o I couldn't find a way to do a *partial* restore of a subset of files.

        --Anyone got some tips on this? I did write a set of scripts for handling volume changes and the like; if anyone's interested in them, email me.
    • Yup you do mean afio. Just like tar but it creates a separate compression record for each file, instead of the entire stream. I have had DDS tapes fail on me (well I screwed them up actually) and yes there were a bunch of unrecoverable files but at the next compression header things started streaming out of the drive again and I recovered quite a lot.
  • Par2 works great (Score:5, Informative)

    by dozer (30790) on Wednesday February 25, 2004 @01:22AM (#8383158)
    Store the recovery information outside the archive. Par2 [sf.net] works really well. You can configure how much redundancy you want (2% should be fine for occasional bit errors, 30% if you burn it to a CD that might get mangled, etc.). It's a work in progress, but it's already really useful.
    • Re:Par2 works great (Score:5, Informative)

      by Stubtify (610318) on Wednesday February 25, 2004 @03:50AM (#8383823)
      Allow me to second this. Par2 is everything the first PAR files were and more. No matter what has been wrong I've always been able to recover with a 10% parity set. (even this seems like a lot of overkill, except on USENET). Interestingly enough Par files have revolutionized USENET, I can't remember the last time I needed a fill.

      good overview here: PAR2 files [slyck.com]

      comparison between v1 and 2: here [quickpar.org.uk]

    • by Anonymous Coward
      There is a patent on a recovery scheme by M. Rabin (I don't have the number handy). The patent covers "n+k" recovery schemes, in which n blocks of data are protected using k recovery blocks. The patent is quite old.

      I wonder if rar, par and par2 infringe on this patent?
    • Excellent suggestion. I've always wondered why noone has integrated PAR2 into INFOZIP, BZIP2, etc...

      Errors would be detected and recovered automatically while PAR2 files scanned for recovery info. Heck, why not stream recovery packets right into the compression stream -- just like solomon-reed and CDROMs.

    • A quick perusal of the QuickPar website [quickpar.org.uk] suggests that at least some Par2 clients can restore based on two damaged files and incomplete recovery files:

      At this point you can have QuickPar load additional PAR2 files (to provide more recovery blocks) or scan additional data files (to see if they contain some of the data from the original files).

      In the past, however, I've been dealing with getting remote files over a noisy connection where the remote server wasn't so thoughtful to create Par files or even

  • Yeah (Score:4, Insightful)

    by photon317 (208409) on Wednesday February 25, 2004 @01:31AM (#8383222)

    The format you're looking for is any format you like stored on reliable storage.

    Why bother with all the intricacies of a pseudo-fault-tolerant data structure? Ultimately the best archive format for recovery will be one that just duplicates the whole archive twice over, doubling space requirements and improving immunity to lost sectors on drives. At which point one asks, "Why don't I just stick to simple files and archives, and use reliable storage that handles this crap for me, for all my data, automagically?" Storage of any sort just keeps getting cheaper and bigger. If you have any interest in the longevity of your data these days, there's almost no excuse for not using the data-mirroring built into virtually every OS these days and doubling your storage cost and read performance while preventing yourself from worrying about drive failure.
    • Re:Yeah (Score:4, Insightful)

      by Viqsi (534904) <jrhunter@menagerie . t f> on Wednesday February 25, 2004 @02:39AM (#8383541)
      Why bother with all the intricacies of a pseudo-fault-tolerant data structure?

      I'm on a laptop. I like my laptop. It's a very nice laptop. However, it doesn't exactly support those kind of hardware upgrades, and I am still ultimately on a bit of a budget.

      I kind of put forth the question not only out of the hope that a Magical Solution To All My Archival Problems would Mystically Appear (puff of smoke optional but appreciated) but because I want to find something I also feel like I can unreservedly reccomend to non-ideological friends who are looking for, say, something slightly more reliable than ZIP files. I could've mentioned that in the article post, but it was already getting long. :)
    • Re:Yeah (Score:2, Insightful)

      by QuantumG (50515)
      Ultimately the best archive format for recovery will be one that just duplicates the whole archive twice over, doubling space requirements and improving immunity to lost sectors on drives.

      Obviously you know nothing about error correction, so STFU.

    • What if the error produces another valid tar archive? How would the computer know which one was the correct one? You actually need *3* copies to reliably dectect a single error if you are doing it this stupid way.

      There are WAY better (by better I mean take up less space and can detect more errors) methods of error detection (and correction), which have filled volumes of research publications and books, so I will not try to get into them here, but a (maybe not so) simple software trick will definitely sav
    • Mirroring is dumb, no not as in dumb to use it, dumb as in the way it works.

      Mirroring HD's only protect against fatal failures of a single HD. Motor stops spinning? Then the other HD takes over.

      It does NOT protect against failures on the disc. Errors while writing or reading or other fun stuff.

      For true backup you need the following.

      • An original wich you know to be correct. Obvious but I seen failures start right here.
      • A reasonably secure storage medium of wich you know the life expectancy.
      • A checksum or
  • cpio (Score:5, Informative)

    by Kevin Burtch (13372) on Wednesday February 25, 2004 @01:42AM (#8383286)

    True, tar cannot handle a single error... all files past that error are lost.

    On the other hand, cpio (and clones) can handle missing/damaged data without losing the undamaged portions that follow (you only lose the archived file that contains the damage). It is the only common/free format I can think of (from the top of my head) that is capable of this.
    • Re:cpio (Score:2, Informative)

      by Anonymous Coward
      On the other hand, cpio (and clones) can handle missing/damaged data without losing the undamaged portions that follow (you only lose the archived file that contains the damage). It is the only common/free format I can think of (from the top of my head) that is capable of this.

      ZIP also supports this (the command is "zip -F" with Info-ZIP, the standard zip/unzip program on Linux).
    • A particular implementation of tar may not handle errors well, but that isn't a defect of the file format. The program should be able to skip over damaged sections of the tape and recover the rest of the files.

      • Nope, it's a design flaw - and a well known and documented one at that. See the O'reilly backup book (written by former co-workers) for more details (though it was well documented long before that book was written).

        Just try to restore anything (past the error) using *any* version of tar, from a tar file (or tape) with an error in the middle. It will bomb out as soon as it hits the error.

        • Re:cpio (Score:3, Informative)

          by Detritus (11846)
          I know that I've recovered data from damaged tar archives in the past. I just ran some tests with intentionally damaged tar files, using GNU tar from FreeBSD 5.2.1. GNU tar successfully recovered the data from all of the damaged tar files. It just skips over the damaged bits and resynchronizes at the next valid file header.
  • Tar options (Score:3, Insightful)

    by aster_ken (516808) <dustincook469@live.com> on Wednesday February 25, 2004 @02:15AM (#8383435)
    Wouldn't simply running tar with --ignore-failed-read achieve the desired results? It wouldn't simply stop once it hits an error. Instead, tar will proceed beyond the error and probably just write out junk data (if anything at all) for the corrupted part of the archive.

    DISCLAIMER: I haven't tried this, and I'm not entirely sure this is what you want.
    • Woohoo! Manually parsing binary data to put together pieces of damaged files!

      I think an important feature of the issue here would be that life is a lot easier if you never get the junk data in the first place.
  • by Kris_J (10111) * on Wednesday February 25, 2004 @02:31AM (#8383508) Journal
    RAR compression is free for decompression [rarsoft.com] with source available, heaps of precompiled binaries for decompression on your OS of choice and it's included in a whole heap of popular free archive programs. Just burn the latest source on every CD you make and you should be fine.
    • Non-free, true. Non-Free, though, is a different story. The license for UnRAR's source is pretty restrictive (basically, "you can use this as intended anywhere, but you can't sell it or modify it to create RAR files"). So, unfortunately it doesn't totally work out.

      The fact that that much exists does ease my mind about existing archives I've got (which is why I didn't mass-convert them ages ago). It's creating new, future archives that I'm worried about. :)
      • by Kris_J (10111) * on Wednesday February 25, 2004 @03:08AM (#8383678) Journal
        But if you purchase it, as I have, you get a product you can use from now until forever, so long as your OS supports it, plus you can get the decompression source so that you (or someone else) can always write a decompressor for a future platform. Surely you don't need to worry about replacing it until both the following are true: None of the versions you've purchased run on your current platform AND no version compatible with your current platform is available (at a reasonable price). At that point you stop creating RAR archives and simply keep the decompressor around (porting and recompiling as necessary).

        (Personally, I don't care about recovery records, I just keep two copies of everything, and I moved to 7-zip -- which can decompress RAR -- about six months ago.)

        • Both are useless. Of course if you know were the fault is and they are in different checksum parts then you can mesh them together to get a working file. But that is what recovery records are for.

          No two files failing isn't likely to happen. We are however dealing here with disaster recovery. Disasters are always disastrous.

          Of course error recovery won't work with a total failure like say a fire. Then your second copie is the better solution.

          So two copies is a good idea. Error recovery is a good idea.

          Two

          • Okay, well at work the backup system involves a two week rotation, off-site backups, monthly snapshots archived to DVD and two-hourly backups of the main database (during work hours), for something in the order of twenty copies at any one time -- each slightly different versions in case that someone makes a mistake and it takes a while to spot (which only happens about once every six months now, as opposed to once or twice a week the last place I worked). That's where I care about "disaster recovery".

            At ho

  • by jhoger (519683) on Wednesday February 25, 2004 @02:44AM (#8383567) Homepage
    They are backing up data to a MiniDV camcorder adding forward error correction using a simple command line utility to allow holes in the tape the size of a pin without any data loss.

    -- John.
  • Yes... (Score:4, Funny)

    by caesar79 (579090) on Wednesday February 25, 2004 @03:25AM (#8383734)
    its an amazing technology...only quite involved.
    Basically you concatenate all the files together (cat should do), print it out on good 32lb paper, get a professor's signature and file it in a college lib...heard those things stick around for centuries
  • by wotevah (620758) on Wednesday February 25, 2004 @03:35AM (#8383776) Journal
    A quick google search [google.com] turns up the link shown at the end of this post, from which I quote:

    The gzip Recovery Toolkit

    The gzip Recovery Toolkit has a program - gzrecover - that attempts to skip over bad data in a gzip archive and a patch to GNU tar that enables that program to skip over bad data and extract whatever files might be there. This saved me from exactly the above situation. Hopefully it will help you as well.
    [...]
    Here's an example:

    $ ls *.gz
    my-corrupted-backup.tar.gz
    $ gzrecover my-corrupted-backup.tar.gz
    $ ls *.recovered
    my-corrupted-backup.tar.recovered
    $ tar --recover -xvf my-corrupted-backup.tar.recovered > /tmp/tar.log 2>&1 &
    $ tail -f /tmp/tar.log

    http://www.urbanophile.com/arenn/hacking/gzrt/gzrt .html
  • by vasqzr (619165) <{vasqzr} {at} {netscape.net}> on Wednesday February 25, 2004 @09:13AM (#8384946)

    Back in my Win32 days, I was a very frequent user of RAR archives.

    Bablefish translation: I was a huge warez kiddie.

    On a related noted, were there any wide-spread, legitimate uses of .RAR? I only remember .ARJ and .ZIP
    • Ha ha. No, I don't touch warez. The first time I almost did (with the Win95 private beta) I was caught by my father (I was raised by geeks, so it's taken a lot to be able to adjust to human society ;D ) and got a nice long lesson on that one. Pirating software is bad, 'mkay?

      No, I got into RAR because a friend of mine (who was and still is into video game console emulation, especially music, which is where she discovered the format, I presume) used it to distribute her music compositions for a period, and I
    • Re:RAR Archives (Score:2, Insightful)

      by jonadab (583620)
      > were there any wide-spread, legitimate uses of .RAR?

      RAR was heavily used in Germany, among the gamer community. A lot of Descent
      players for example distributed their custom levels, missions, textures,
      hogfile utilities, savegame editors, and whatnot in RAR format. It was
      annoying; I had to go hunt down and download a RAR extractor just to install
      some of the stuff.

      The usual argument was that RAR was "better" than ZIP either because of the
      compression rates or because of the partial recoverability or wh
      • My opinion on the matter has always been that for distributing stuff over the internet, the most ubiquitous format is automatically the best, so ZIP is better than RAR irrespective of technical issues, due to compatility concerns.

        You do have a point, but on the other hand, everything has to start somewhere. Things have to evolve, we must move on to better things sometimes.

        Just the fact that .rar is extremely popular in some circles is proof that it can work.

        Otherwise we'll be using the .zip format
      • Um... anyone using .tar.gz (or .tgz) is either using (USUALLY, I'M GENERALIZING) *nix, which comes with a bz2 util too, or WinZip, which will decompress both. (It won't make either type, but it'll decompress them.) So, use the most technologically advanced format.
  • Why bother with recoverability?

    Total loss of the file seems more likely than bit flipping by themselves.

    When your storage hardware/media starts flipping bits, it's probably going to die pretty soon.

    And more often than not, your storage hardware/media just dies before you experience any bit flips.

    You talk about your laptop computer and being on a budget. If you can't afford to make copies of your important files and store them elsewhere, then either your files aren't important, or successfully maintainin
    • Historical paranoia's the primary reason. I've had hard drives headcrash suddenly for no apparent reason, and typically I have to rush to get data off of them, and I'm usually lucky. And when I wasn't as lucky one time, RAR helped. I've never had a hard drive just suddenly Quit Working on me before; they've all been Slow, Painful Deaths.

      Also, I almost never find the time in the week to do backups. I work for a nonprofit, so My Time Is Semiwillingly Not My Own. :) (We *do* do backups over there, though, so
  • Tar alone can recover past a damaged point it will 'read' past the erroneous data, and recover your data. I believe cpio exhibits the same behavior. It is when you compress the archive (with .gz or bz2) it may become unrecoverable. If you use tar alone however, you will always be able to recover some of the data in a damaged archive.
  • tarfix (Score:3, Insightful)

    by morelife (213920) <f00fbug@postREM[ ... t ['OVE' in gap]> on Wednesday February 25, 2004 @10:04AM (#8385292)
    tarfix

    may help some of those archive issues.

    But, the archive format is not going to save you. Use multiple media. You need more than one physical archive for better safety, regardless of format. Hell, you'll probably die before some of today's media fails.
  • ... my subversion archive is now more than can fit in a cd. Is there a tool I can use to split the big file in two cd's, hopefully something that doesn't need another piece of software to reinstall the big file.
    • just use RAR volume (split file) capabilities

      originally designed to let you fit large files
      onto floppies, but can be used for everything,
      splitting dvds onto cds, etc. from the manual:

      -v[k|b|f|m|M]

      Create volumes with size=*1000 [*1024 | *1].
      By default this switch uses as thousands (1000) of bytes
      (not 1024 x bytes). You may also enter the size in kilobytes
      using the symbol 'k', in bytes using the symbol 'b',
      in megabytes - 'm', in millions of bytes - 'M' or select
      one of several predefined
    • What's wrong with the split command? I believe the file parts are reassembled by just cating them together.

  • rar has one of the best recovery methods, as it has mutliple of them.

    during compression:
    Recovery Record (-rr option)

    it has Recovery Record, this is data appended to the actual
    rar file that lets you recover from errors within a file. The
    default RR takes 1% of the archive and lets you recover 0.6%. You
    can change this behaviour to going more recoverability by
    specifying -rr[N]p and telling it larger percantage for recoverability.

    Recovery Volume (-rv option)

    further more, ra
    • --You know, if he would lower the registration price to ~$15 or so I bet he'd get a lot more incoming money. The economy SUCKS, and $29-30 is just way too much for what basically amounts to a zip-competitor, even though RAR is a good archiver.
      • I dont see what you are trying to say. Compare
        winzip to winrar:
        - same price
        - winrar comes with full command line, and gui interface
        - both support variaty of decompression schemes
        - zip provides worse compression ratios than rar
        - zip has virtually no recovery methods
        - zip has no multi archive support (unless you
        consider the current hack as a valid system)
        - zip uses pathetic encryption (password breakers
        exist for over a decade, rar still has not a
        single password breaker)

        So crappy Zip has same asking
        • --Having not had a job for a long time, I won't pay for Winzip either. Why should I, when "zip" and "unzip" comes free with Linux? The only problem with Zip currently is the pathetic 2GB filesize limit. This should have been fixed (at least on the Linux side) circa 1999 or so.

          --However, I *would like to* support Rar, as I perceive it to be a superior archiver. I could see my way to donating $10-$15 to help the guy make a living, but the current price is too much.

          (I hardly ever use Windoze for anything
  • by Anonymous Coward
    You definitely shouldn't use RAR for archival purposes, but for extracting existing archives, try unrarlib [unrarlib.org]. It includes a library for accessing the contents of RAR files, and an "unrar" utility based on this library. It is dual-licensed under the GPL and a more restrictive license.
  • DAR - Disk ARchiver & Parchive combined sounds like it would work wonders.

    From http://dar.linux.free.fr/ [linux.free.fr]:
    dar is a shell command, that makes backup of a directory tree and files. It is released under the GNU General Public License (GPL in the following) and actually has been tested under Linux, Windows and Solaris. Since version 2.0.0 an Application Interface (API) is available to open the way to external independent Graphical User Interfaces (GUIs). An extension of this API (in its version 2) is in th
  • { The poster is looking for alternatives to tar, because he has concerns about tarball content recovery. }

    It's been possible to do that for well over a decade, using various utilities such as tarx. I've successfully recovered files after a damaged point in a tarball many times. (Sigh, I used to use an old AT&T UNIX with a #$*@# broken tar, which occasionally created corrupt tarballs).

    See this post [sunmanagers.org] on the Sun Managers list circa 1993, and the venerable comp.sources.unix collection, volume 24, for t

Computers will not be perfected until they can compute how much more than the estimate the job will cost.

Working...