Ask Slashdot: What's a Good Tool To Detect Corrupted Files? 247
Volanin writes "Currently I use a triple boot system on my Macbook, including MacOS Lion, Windows 7, and Ubuntu Precise (on which I spend the great majority of my time). To share files between these systems, I have created a huge HFS+ home partition (the MacOS native format, which can also be read in Linux, and in Windows with Paragon HFS). But last week, while working on Ubuntu, my battery ran out and the computer suddenly powered off. When I powered it on again, the filesystem integrity was OK (after a scandisk by MacOS), but a lot of my files' contents were silently corrupted (and my last backup was from August...). Mostly, these files are JPGs, MP3s, and MPG/MOV videos, with a few PDFs scattered around. I want to get rid of the corrupted files, since they waste space uselessly, but the only way I have to check for corruption is opening them up one by one. Is there a good set of tools to verify the integrity by filetype, so I can detect (and delete) my bad files?"
AppleScript (Score:3, Interesting)
you seem to be surprisingly ok with the fact that your computer crashed and all your documents and media were corrupted, as was your backup. I would have been beside myself. Hulk smash! Please let us know what different set ups you're exploring to avoid this.
Re:AppleScript (Score:4, Insightful)
But the open usually won't fail. Unless the error is within the header bytes of a movie or image, the media will open, but will appear wrong. Worse, there is no way to detect this corruption because media file formats generally do not contain any sort of checksums. At best, you could write a script that looks for truncation (not enough bytes to complete a full macroblock), or write a tool that computes the difference between adjacent pixels across macroblock boundaries and flags any pictures in which there is an obvious high energy transition at the macroblock boundary, but even that cannot tell you whether the image is corrupt or simply compressed at a low quality setting with lots of blocking artifacts.
The short answer, however, is "no". Such corruption can't usually be detected programmatically.
Re: (Score:2)
I should clarify. If you are intimately familiar with the format, and if it is a multi-frame format, such as a compressed audio or video format, it is possible to programmatically detect that there are frames that reference illegal frames, frames whose structure is not valid, etc. in much the same way that you can detect a JPEG file whose header is invalid.
Again, though, none of this will be caught by merely opening the movie; the movie will generally play correctly up until the decoder encounters the erro
Re: (Score:3)
The TLDR version is this scenario is why you configure your mythtv box to store MPEG TS which have embedded CRC error detection and recovery instead of MPEG PS which are irrelevantly smaller, if you have the option.
Re: (Score:2)
Re:AppleScript (Score:4, Interesting)
Here's what I did when I realized my mp3 collection on my Mac was slowly dying:
find -print -exec cat {} > /dev/null
it takes a while, but for files with ioerrors you'll see a warning printed after the file name. Put the output in a file and you can use grep(the 'B' option comes to mind) to get a list of the bad files.
The sad thing is that Time Machine didn't seem to notice that the files were bad, so now the files are gone forever. Disk Utility didn't help.
Shouldn't there be a way to find bad blocks on OS X? I looked around and all I could find were commercial products.
Re: (Score:2)
File corruption wont generate ioerrors I dont think. Your system may be able to properly read data from the disks, data that it thinks is what you requested, its just that the data is bad. A computer isnt going to generally be able to detect that without either knowledge of the file format, or checksums.
BSOD? (Score:2)
"What's a Good Tool To Detect Corrupted Files?"
BSOD?
Re:BSOD? No, use open source "Tripwire" (Score:4, Informative)
Not the BSOD.
If the OP had used open source "tripwire" on known-good files in each filesystem on his Macbook, and saved the resultant data output to a USB thumbdrive formatted with FAT32, the OP would have had a good chance of determining all corrupted files. In this case, an ounce of prevention would have prevented several pounds of "cure".
Check out http://tripwire.org./ [tripwire.org.]
The BEST method.. (Score:5, Funny)
is urgency. Corrupted files have the ability to detect urgency and your discovery of them will come in a form compatible with the laws of Murphy.
No easy answer (Score:2, Insightful)
1. Compare to backup, files that match are ok.
2. AppleScript option others mentioned may help reduce it further.
3. Backup regularly, and verify your backup procedure.
4. Anything else will cost you consulting rates.
For MP3s use amp3test.exe (Score:5, Informative)
2000-2001 MAF-Soft http://www.maf-soft.de/ [maf-soft.de]
The version I have is v1.0.3.102
It can scan single mp3s and entire folders structures for defects and logs everything if you wish. It will give you a percentage of how good the file is.
Depending on the damage you may be able to fix headers and chop off corrupted tag info with something like a MP3Pro Trim v1.80.exe
md5 and shell scripting (Score:2, Offtopic)
md5sum (Score:4, Interesting)
or sha1sum if you prefer. Automate in cron against a list of knowns.
eg: /home/wilbur/Documents/* > /home/wilbur/Docs.md5 /home/wilbur/Docs.md5
$ md5sum
$ md5sum -c
Re: (Score:3, Informative)
Par2 (Score:2)
That's a pretty good idea, if you only want to detect corrupted files (and yes, I know that's what the OP said he wanted), but I can't believe no one's suggested par2, yet. It will not only detect corrupted files, but repair them, too. If he had used par2, he wouldn't have to delete them.
For JPEGs (Score:5, Informative)
You can run jpeginfo -c. I have a script that runs against a directory and makes a list for when I do data recovery for all my friends who don't listen when I tell them their 10 year old laptop may be dying soon.
Re: (Score:2)
Author here:
This method detected a single corrupted picture.
Probably my pictures were the least affected of all my data.
Thanks for the great idea.
the answer is not "file" (Score:3)
unix "file" is not the answer. For some formats it does as little as look at a couple header bytes. Its a great tool to guess a format. Its a terrible verifying parser and does nothing to verify content.
An example of what I'm getting at, with some made up details, unfortunately html is not like well formed xml and every viewer is different anyway so the best way to figure out if a html web page file format is corrupt is unfortunately to pull it up in firefox. This only detects corruption in the structure of the file, if the corruption is just a couple bits then you end up with problems like tQis where the only way to see the h got fouled up is to write more or less a IQ 100 artificial intelligence. All "file" is going to test is pretty much does the file begin with or contain a regex something like less-than html greater-than (getting past the filters).
For content you could F around with, for example, piping a mp3 file thru a decoder and then thru an averaging spectrum analyzer and see if there's anything overly unusual in the spectrum. Also some heuristics like is the file only 1 second long, then its F'ed up.
right filesystem (Score:3)
You need good filesystem, with embedded data checksum and self-healing using redundant copies. For Linux - btrfs is fine. For Mac OS X & Linux - ZFS.
Re: (Score:2)
Re: (Score:2)
Author here:
The problem lies in finding a filesystem that can be accessed by all three OSes. I would go with NTFS as well, but last time I tried, MacOS could not write to it. Every guide out there recommends FAT32, but the 4GB file size limitation is a deal breaker for me.
Re: (Score:2)
I use RAR to split the >4GB files in half. To date I'veonly needed to do that once (a DVD rip).
Re: (Score:3)
10.5 and 10.6 and I assume 10.7 have read/write support but its not enabled by default, and is not officially supported.
http://hints.macworld.com/article.php?story=20090913140023382 [macworld.com]
Also you are using paragon HFS+ for windows... you should already be aware they have Paragon NTFS for Mac.
A bigger question is whether NTFS is the best filesystem to use, and that's a separate question entirely. And that's a question I don't know the answer to.
So, if the primary OS was windows... then I'd use NTFS.
But if you spen
Re: (Score:3)
NTFS-3G supports writing to NTFS. AFAIK, most Linux distributions use it instead of the kernel driver and there's a OSX port as well.
Re: (Score:3)
Finding a way to make the Mac read NTFS beats using MacDrive for HFS+ on the Windows side. NTFS just doesn't corrupt as easily with a power failure as HFS+, in my experience. Ideally, I would just use networked storage and access it from Mac OSX with afpd or NFS, from Windows with Samba, and linux with NFS.
Re: (Score:2)
The best filesystem to survive a crash is a filesystem designed for an operating system that is expected to crash: NTFS.
I don't know if I should laugh or ask what evidence that you have NTFS is the "best".
Re: (Score:2)
The problem with that rationale is that the set of developers that make systems that crash often is hightly correlated with the set of developers that make FSs that corrupt data often.
Re:right filesystem (Score:5, Informative)
Two aspects to your problem:
1) Recovering from the current situation
If you didn't make ANY changes to the filesystem after it was corrupted, you still have a chance with software like DiskWarrior or Stelar Phoenix. Never work on the original corrupted filesystem unless you have copies of it. So grab a second drive, connect it over USB and using hdiutil or dd copy it to the second drive. Once you do that, use DiskWarrior or Stelar Phoenix on either one of the copies, while keeping the other one intact. Always have an intact copy of the original FS. You might be successful trying multiple methods, so KEEP AN INTACT COPY.
2) Avoiding it in the future
NTFS is good at surviving a crash if and only if the crash occurs in Windows. Paragon NTFS for Mac/Linux or NTFS-3G don't use journaling to it's full extent (for both metadata and data). So, if you get a crash while in Mac OS X or Linux, chances are that you get data corruption.
Same goes for HFS+. While Mac OS X uses journaling on HFS+, Linux doesn't. It's read-only in Linux if it has journaling. Furthermore, the journaling is metadata only in HFS+.
Now we get to the last journaled filesystem available to all 3 OSs: EXT3. It's the same crap as above.
Because of the three points above, I have a conclusion: what you're looking for (ZFS) hasn't been invented on any of the OSs that you're using.
Thus, I have a simple recommendation:
Use ZFS in a VMware machine exported via CIFS/WebDAV/NFS/AFP to Linux, Windows or Mac OS X. A small FreeNAS VM with 256MB of RAM can run in VMWare Player and Workstation on Windows/Linux and Fusion on OS X.
ZFS uses checksumming on the filesystem blocks, which lets you know of the silent corruptions. Furthermore, by design, it will be able to roll-back any incomplete filesystem transactions. I've had my arse saved by ZFS more times than I care to remember. The most difficult thing for my home storage system is to find external disk arrays that give me direct access to all the disks (not their RAID crap). A proper home storage system is RAIDZ2 (basically RAID6) + Hot Spare.
Another way is to have a simple, TimeMachine-like backup solution on at least one of your operating systems. But even that doesn't catch silent data corruptions, let alone warn you. As such, we get back to: ZFS.
Tech Tool Pro, perhaps (Score:3, Informative)
Tech Tool Pro, over on the Mac side, has a "File Structures" check which looks at a lot of different structured file types to make sure that their internal format is valid.
Reed Solomon to the rescue (Score:2)
It's already too late, but I keep important files with par2 files. That way, when there's like 5% corruption, I can still fix the file.
I do this with flac files and some datafiles.
Also make sure you keep backups going. I guess this was your warning. Everyone needs one.
Re: (Score:2)
For archival purposes, I've started using WinRAR ( http://www.rarlabs.com/ [rarlabs.com] ) with the file authenticity and recovery op
Re: (Score:2)
There is a good link here:
http://ttsiodras.github.com/rsbep.html [github.com]
This is a good move for creating par files etc as part of your backups. He also has some other really good information up there in regards to protecting data. Especially creating backups under windows:
http://ttsiodras.github.com/win32backup.html [github.com]
Re: (Score:2)
Better use Crashplan (free). Backup to a remote computer, internet or your own disks in the background. Works for me (and lots of other people).
A lot of corrupt files? (Score:5, Interesting)
That seems very strange--the only files that should really be corrupted, unless something extremely rare and catastrophic happened, are the ones that were being written when power went out, or were cached. And even then, a flush usually flushes everything, or at least whole files at once, or areas of disk. Is the partition highly fragmented or something?
I know this doesn't do much for your question, but that kind of failure mode is almost exactly what filesystems do their damnedest to avoid. HFS+, being journaled, should be even more proof against, well, exactly what happened to you. Maybe the Linux driver is poor, but man, if you got silent data corruption on a multitude of files that weren't even being written, that's really bad and the driver should be classified "EXPERIMENTAL" at best, and certainly not compiled into distros' default kernels.
To answer your question, I don't have experience with any tools (I automate my backups, and any archival files go on a RAID volume that does a full integrity scan nightly), but once you find one, you should separate your files into two categories--"must be good", and "can be bad". The "must be good" files (serial #s, source code, etc.), you hand-check, so you know for certain that every one of them is good. It'll also motivate you to replace them now, instead of later when replacements will only get harder to come by. The "can be bad" files (music, pictures, etc.), you do the automated check on and then just delete as you run into ones that the check missed. This has the advantage of concentrating your effort into where it's useful. If you try to check all of your files, you'll just burn out before you finish. You may even want to do more advanced triaging, but you'll have to come up with the categories and criteria there. The main thing is, split this problem up.
Re:A lot of corrupt files? (Score:5, Informative)
Very few filesystems keep checksums - only btrfs and zfs come to my mind.
With defective hardware (RAM issues in main memory and disk or controller caches are fun) you can have silent corruption that goes on for a long time. Also bits on disks rot but those should give you a CRC or ECC error.
Re: (Score:2)
Yeah, that's what I was saying--it's pretty unlikely that the power failure caused this, so the author should try to find the true root of the problem.
You are right to suspect the driver: (Score:2)
The Linux HFS+ driver can't even work in write mode unless the journal has been deleted, so the journal isn't working when using the HFS+ partition under Ubuntu and probably Windows as well (author take note). I would not use that filesystem under Linux or Windows on a daily basis. Also, since the journal has been deleted, you are probably missing the safety of journaling under the native OSX as well.
Author should also note that archival backups with md5 or sha256 checksums are probably the most straightfor
mplayer/mencoder (or ffmpeg) & imagemagick (Score:4, Informative)
Check why the files are corrupted (Score:5, Insightful)
I'd be asking myself why lots of files became corrupted from one dodgy file system event. Assuming HFS works like file systems I'm more familiar with, it will allocate sequential blocks for files wherever it can. This means that a random filesystem splat is really unlikely to corrupt loads and loads of files. You might expect a file system corruption to cause a load of files to go missing (if a directory entry is corrupted) or corrupt a few files, but not put random errors into loads of files.
I'd check to see whether files I was writing now get corrupted too. It might be dodgy disk or RAM in your computer.
The above might be complete paranoia, but I'm a paranoid person when it comes to my data, and silent corruption is the absolute worst form of corruption.
For next time, store MD5SUM files so you can see what gets corrupted and what doesn't (that is what I do for my digital picture and video archive).
Re: (Score:2)
The bit rot could have gone on for some time. How often do you check those videos or MP3s that you downloaded years ago?
Re: (Score:3)
A suggestion: Instead of triple booting... (Score:3)
zfs (Score:3)
zfs [wikipedia.org]! Works great. Included with FreeBSD 9 [freebsd.org], amongst other OSs.
You might also enjoy John Siracusa's exhaustive review of filesystems [5by5.tv] on one of my favorite podcasts.
Use JHOVE (Score:3)
The JSTOR/Harvard Object Validation Environment:
http://hul.harvard.edu/jhove/ [harvard.edu]
It's specifically designed to first probabilistically identify files, then attempt to verify their format.
Disclaimer: I haven't worked on it directly, but I did spend a number in the digital preservation space, so I probably know some of the people who have contributed to it.
D&D approach (Score:2)
Cast Detect Evil, Sense Motive, and Discern Lies on the potentially corrupted files.
Re: (Score:2)
Author here:
Sorry, but I can't stand anymore the Paladin of the party insisting on replacing the HD for a tried and true Bag Of Holding.
Thanks for the tip anyway.
Get Rid Of Paragon! (Score:5, Interesting)
It is a truly shoddy piece of software that as of version 9.0 has a terrible bug that will cause it to destroy HFS+ filesystems. Google "paragon hfs corruption" and you will see many many horror stories from people who just plugged a Mac OS X disk into a Windows machine w/ Paragon HFS and then discovered the entire filesystem was hosed. In my dual-boot win/mac setup I replaced my copy of MacDrive with a trial version of Paragon HFS 9.0 from their website and every single one of the six HFS+ disks I had connected internally were damaged. Disk Utility couldn't do a thing and I had to buy a program called Diskwarrior to even begin to recover data. I ended up losing two disks worth of files anyway.
http://www.mac-help.com/t12137-opened-hfs-drive-win7-paragon-hfs-now-wont-boot.html [mac-help.com]
http://www.wilderssecurity.com/showthread.php?t=299306 [wilderssecurity.com]
http://hardforum.com/showthread.php?t=1677099 [hardforum.com]
http://www.avforums.com/forums/apple-mac/1509344-hfs-super-block-not-found.html [avforums.com]
whew! Anyway the pain I went through after that software very nearly ruined my life was so great, I don't want it to happen to anyone else. According to their own website [paragon-software.com] 9.0 has this awful bug but they fixed it in 9.0.1. Evidently the trial download on the main page is still for version 9.0 and still has the disk destroying bug! Any software company that releases a filesystem driver with this terrible a bug (not to mention the numerous reports of BSODs and other relatively minor problems) clearly has terrible quality assurance and simply can't be trusted.
Re: (Score:2)
Author here:
Just out of curiosity, I went to check the version of my Paragon installer and guess what... it was corrupted! Oh the irony!
Windows is the OS I least use, and I have not booted it for the last month or so. Unless Paragon silently corrupted something there previously and somehow "weakened" the filesystem integrity since. Anyway, thanks for the tip. What do you use currently to read HFS+ in Windows?
Re: (Score:3)
It's only been a couple weeks since the disaster so
Re:Get Rid Of Paragon! (Score:4, Interesting)
Having nothing at all to do with Paragon (not that I'm a fan of the company otherwise), I had a very similar disaster occur with an external eSATA 5TB RAID 5 enclosure. It's one that uses an internal hardware RAID 5 circuit and doesn't require port multiplication, so when connected it appears to the host as a single large volume. At the time I was swapping it between a Linux (Ubuntu) system and a Windows 7 system; it was of course configured as GPT. Eventually I connected it to the Windows 7 system and during boot Windows declared there were problems and initiated chkdsk. Chkdsk ran for more than 18 hours and when it was done, most of the files in the volume were hopelessly corrupted. Upon detailed inspection, I found that blocks of all the files were swapped and intermingled, as if something had made a jigsaw puzzle out of the MFT and couldn't reassemble Humpty Dumpty. Was it chkdsk itself that caused the damage? Was it the swapping between two machines and operating systems (both GPT compliant)? I suspect it was actually caused by chkdsk, but could never prove it.
It may be that simple (Score:2)
Just have your OSX do a repair - it could be that certain VTOC or directory tables were damaged, and a repair may fix it. The files themselves should be OK, but the pointers to them are fubared.
Also try something like http://www.cgsecurity.org/wiki/PhotoRec [cgsecurity.org] or similar to recover deleted files. There's one for OSX. Run it after a repair, and photorec, and you should get most of your crap back.
backup strategy to prevent this (Score:2)
Bad news... and good (Score:3)
The bad news is I don't know of any (and I don't think you'll find any) easy, one-shot tool to run across the whole lot that gives you a simple "corrupted yes/no?" answer to lots of different filetypes.
The good news is it'd be reasonably easy to lash together something in bash, kick it off overnight and come back in the morning to a list of probably-corrupted files.
In pseudo-bash (because I haven't the time to write it out and check it works properly), something like this would be a good start:
function checkJpeg {
jpeginfo -c $1 || return 1
return 0
}
function checkPdf {
# do something to check a PDF is OK
}
FILETYPE=`file $1` ;; ;;
case $FILETYPE in
"jpeg" )
checkJpeg $1 || echo $1
"PDF )
checkPdf $1 || echo $1
esac
Then run it with the help of find /home -type f -print0 to check every file in /home. This would give you a list of potentially-corrupted files. Up to you how you deal with it - personally I wouldn't run rm against it in case you find files that can be rescued or that your checks aren't as perfect as you'd like.
For extra credit, determine the expected filetype based on file extension and then use file(1) as your first "is it corrupted?" test - that way you'll spot files that are too corrupted for file(1) to work reliably.
Re: (Score:2)
Actually there is a tool that does all of that already: JHOVE - JSTOR/Harvard Object Validation Environment.
http://hul.harvard.edu/jhove/ [harvard.edu]
It's used in the digital preservation field, for example in an archive to try to figure out what they've got and what state it's in.
Re: (Score:2)
All such a process can do is verify that the file header appears well-formed. That might flag a few bad apples, but the ones with good headers and corrupted contents will slip through the cracks.
There's a whole slew of them (Score:2)
md5sum is the one I know best, but that's because my computing is unix-centric.
Be philosophical about it (Score:2)
... yes, this is not what you want to hear at this point, but try to have a positive take on this.
Last year during a routing Windows7 installation, my second hard drive from which I double boot my 90%-of-the-time-in-use Linux was destroyed. Either a coincidence that it occurred during the win7 installation or a nefarious plot, but the hard disk, a 1TB Seageate sata, developed an unrecoverable click of death.
On that hard drive I had my short stories which I had written in college and the intervening years s
Says The Knack: You'll find out the hard way (Score:2)
Lacking not only a backup but also PAR(2) and MD5 files, manual inspection of each and every file is the ONLY way you can determine their integrity. There is no automagic after-the-fact integrity check. If you had MD5 sums for every file, you could at least check their integrity. Some PAR2 files would not only verify but possibly repair if the damage wasn't more extensive than the PAR recovery blocks. Of course if you're willing and able to do all that, you'd probably have had full and differential back
George (Score:3)
George is your best bet. He's not bright enough for most support tasks, but he can certainly handle this one.
A multi-tool approach may be necessary (Score:3)
1. You're going to want to be familiar with both file(1) and find(1). File(1) is pretty straightforward, but be aware that its heuristics for file type detection vary in accuracy. If you're not find-literate, then at least get used to this construct:
find
which will recursively search directory
find
which will search the same directory, but will return a list of all (plain) files, that is, things which are not directories, devices, sockets, etc., sorted and dumped into file
2. You now have (a) a list of all jpg files and (b) a list of all files. (I picked jpg arbitrarily to illustrate the process, by the way.) You can now generate a list of all files that are NOT jpg with this:
comm -13
The point of this exercise is that you can now repeat steps 1-2 with
sed -e "s/.*\.//"
will give you a rough idea.
3. Now then...you'll need some tools for dealing with each file type. The first tool I'd use is stat(1), to check sizes for plausability. Then things like jpeginfo(1), mp3val(1), tidy(1), will be some help, but of course you'll need to distinguish between "error message emitted because file is corrupt" and "error message emitted because file has minor issues...that it had BEFORE this episode". You may need to check the Ubuntu repository for tools you don't have; you may need to do some searching on the web for "Linux tool to check PDF integrity) and similar.
4. If you have backups of any kind and can restore them, then you could try using sum(1) to compare checksums pre- and post-incident. This is a filetype-invariant method, which is good because it lets you skip the above...but bad because all it wll tell you is "different", not "mildly damaged" or "horribly corrupted" or something in between.
5. I would recommend against deleting anything at this point. Instead, move it to secondary storage, like an external drive. I don't have a specific reason for advising this, other than "many years of experience doing partially-manual, partially-automated things like this and a recognition that sometimes errors in the methodology...or fatigue introduced by the tedium of executing it...lead to mistakes".
6. Good luck.
Re: (Score:2, Funny)
Have some respect, the man just lost his entire porn stash.
Re:Gamemaker sucks ass (Score:5, Funny)
Author here:
Ok, I could deal with the loss of some unique videos and pictures from travels... but now that you mention the porn... *weep*
Re: (Score:2)
Re: (Score:2, Insightful)
Consider the possibility that the backup already contains corrupted files. I once had defective RAM where only one bit flipped occasionally. The machine was quite stable, so the defect went undetected and over a couple of months it silently corrupted hundreds of files. Unless he finds out what caused the crash, he can't be sure that the backup is alright.
Re:compare them to an intact backup (Score:5, Insightful)
Well...
My first suspicion would be that the filesystem is messed up, not the actual files. Unless s/he had a lot of pending writes to all of these files, there is no reason that something should have actually overwritten or garbled them when the power shut down. Much more likely was an impending or in-progress write to the filesystem's tables, which has affected where it thinks all the files' pieces are stored. And if that is the case, date modified and size may be irrelevant because those are going to be reported by the filesystem.
Aside from trying to read back sector-by-sector data and assembling them, however, I don't know that there's a remedy.
Re:compare them to an intact backup (Score:5, Informative)
That is a good thought, and photorec does an excellent job of finding pictures and videos by searching through your sectors - definitely worth a try.
http://www.cgsecurity.org/wiki/PhotoRec_Step_By_Step [cgsecurity.org]
Re: (Score:3)
Seconding the photorec / testdisk suite, they are incredible. I would rate it up with ddrescue as the top 2 data recovery tools.
Photorec is great BUT (Score:4, Interesting)
Indeed, I used photorec/testdisk to recover mp4 files after they had (all) been accidentally deleted from an HFS+ partition.
But when I first started it in it's default mode, it "found" only rubbish, breaking up the actual mp4s into a mess of .doc, xml, jpg, .whatever files, including totally broken .mp4s.
When I restarted it after configuring it to only look for .mov/.mp4, it did a fantastic job, and as far as I know, all files could be recovered. Of course, that was made easier by the fact that I knew that all the files which needed to be recovered were .mp4.
Re:Newbie question hour? (Score:5, Informative)
Author here:
> Last backup August.
Yes, that was silly of me.
> Thinks there is a way to detect generic file corruption
There is no way to detect generic file corruption. But there is a way to detect specific filetype corruption. For example, I already found mp3val, that is able to scan all my mp3 and check for file integrity, and even fix a few kinds of corruption (such as unmatching bytes in the header and sound chunks). Maybe with the right set of tools, I might also detect (or even fix) my corrupted pictures, movies and books as well.
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:3)
Re: (Score:3)
Re:Newbie question hour? (Score:5, Interesting)
Re:Newbie question hour? (Score:5, Funny)
mplayer can detect corrupted movie and audio files find . -name '*.mov' -exec mplayer -msglevel all=6 -speed 100.0 -framedrop -nogui -nolirc -cache 8192 -tskeepbroken -ao null -vo null {} \; | grep Warning! > $1.txt Change the *.mov as appropriate.
<infomercial>its JUST. THAT. EASY folks!</infomercial>
Re: (Score:2)
Look, you're really taking the wrong approach here. The way to deal with corruption is avoidance, backup, and corrective action.
1) Avoidance. This is the generally the role of the filesystem and the underlying hardware, each of which have methods for preventing and correcting data corruption without ever involving the user. The user has a small part to play by doing things like shutting down instead of turning off whenever possible, though journaling filesystems (i.e., all modern filesystems) will know w
Re: (Score:2)
That won't help detect corruption, only truncation of files. You would need an md5 or similar hash.
md5 is (relatively) slow. a simple CRC-32 will only fail you for 1 in 2 ** 32 corruptions, and I suspect the guy doesn't even have 2 ** 16 files so the odds are CRC-32 is more than good enough and significantly faster.
Then again, he's probably going to be hard drive speed limited not CPU limited. Then again, no point wasting laptop battery on an overly complicated algorithm. CRC32 is gonna use at least 1/5th the CPU/wallclock time and/or battery of md5.
The tradeoff boils down to you can use md5 and burn
Perception bias (Score:2)
I've certainly seen corruption with XP crashes, not a big deal because I do backup. About the same with the other file systems. In this case he was using Mac OS 10.7 Lion, which is a mess, and two others accessing the same partition. Not surprised.
Re: (Score:2)
An honest question :
I've had several crashes over the years with Windows XP but the files, data and system files were never corrupted. In linux it seems that file systems are not very resilient, and the least crash can corrupt your files. Is NTFS such a good well designed file system compared to linux file systems ?
Linux supports a wide array of filesystems. Which ones have you used? I have used ext3 and ext4 and have never run into file corruption problems. Both of those are journaling filesystems. Journaling filesystems helps prevent corruption in the even of power failure.
Beside the filesystem, one other possibility for corrupted files is a bad hard drive. I know someone who reinstalled Windows on his desktop on a regular basis because key files would go missing or get corrupted. I took a look at it and found out t
Re: (Score:2)
In linux it seems that file systems are not very resilient, and the least crash can corrupt your files.
Is NTFS such a good well designed file system compared to linux file systems ?
I've never had corrupt files after a Unix crash; be it SunOS, Solaris, HP-UX, Linux or any of the other Unix variants I've used.
I've never had corrupt files after an XP crash, but I've often had scandisk delete files, including a multi-gigabyte game installer that I'd just downloaded before it crashed. It regularly deleted Firefox bookmarks before they switched from storing them in big HTML files.
The NTFS approach appears to be 'I'll guarantee file system consistency but won't guarantee any of your files ar
Re:file(1) (Score:4, Informative)
Author here:
At first I thought this idea wouldn't work. As some people have already written here, the 'file' command sometimes just checks for a few bytes. But since it is so easy to implement, why not give it a try? And indeed, for videos it worked quite well. Some of the corrupted MOV files were detected simply as 'data file' or even 'MPEG sequence' and were promptly deleted! Thank you for the idea.
Re: (Score:2)
P.S. Synology NAS > Time Capsule by an order of magnitude.
Re: (Score:2)
Agreed. My data loss happened from theft, and the backup was stolen as well. Now my backup drive sits hidden away, wirelessly capturing my backups. Time capsule is a good solution, but there are others. I just bought a 2gb external drive for $160, combined with a wireless router that has a usb port could be a less expensive alternative. I'm actually thinking that the 2gb drives might not be a good backup solution, and am looking into building a NAS specifically for backup using 4 500gb drives in a Raid
Re: (Score:3)
These comments are full of 'helpful' suggestions to compare to backup or to md5's generated from the backups.
That makes no sense.
If he has a good set of backups JUST RESTORE THE BACKUPS to get known good files back. Why would you read every backup file and every current file, then compare them, then make a list of ones that don't match just to restore the backups. Restore them all. done.
Re: (Score:3)
Perhaps but I agree with the first post. Going through and simply looking at all the JPEGs or MPEGs is probably the only way to tell if a file is corrupted (I wouldn't trust the CPU to do an accurate job). Also gives you a change to erase a lot of stuff you really don't need anymore. I dumped 300 gig off my drive simply by going through everything... took awhile but it was worthwhile to get rid of old shows/movies I'll likely never watch.
Re: (Score:3)
My current setup is to have everything on my server box and simply copy over what I need to my laptop as I need it and NFS/SSHFS the rest of it on the fly when home.
Re: (Score:2)
Maybe you didn't mean it this way, but dang if I did not see all the PHBs come out from work with your comment. "I can get 1TB Drives from Fryes for $80.00, why do you say it costs several hundred?".
Oh, you wanted redundant drives to be covered in the event of a failure? You wanted a drive that has some performance so it does not take 32 minutes to open your word file? So much for that 1TB for 80 bucks thing...
The new one is "SSDs are only $150.00, and they are the same as what you get for SSDs without a
Re: (Score:3)
That seems not worth it. The thing is, both drive-space and data-volume tends to double every ~18 months or so. You wait first "a couple of years", then on a network drive, then once a decade has passed, they go in the trash.
But a decade ago the cheapest storage was a 40GB drive costing $130 or thereabouts. Today 40GB worth of space is 1.5% of that shiny new 3TB-disk costing $150 or thereabouts.
There's essentially no benefit to deleting old data, because old data is *always* small data, and so copying it to
Re:Your eyes (Score:5, Informative)
Well, jpeg files have a structure that will generate detectable errors if it's damaged. So simply opening them with something as simple as djpeg from the IJG and piping the output to /dev/null should give you a pretty good start on damaged images. Something like this perhaps:
find . -name "*jpg" -o -name "*jpeg" -o -name "*JPG" -o -name "*JPEG" | while read filename; do if djpeg "$filename" > /dev/null 2> then :; else echo "$filename" is toast; fi; done
You could probably do something similar with mpg123 and mplayer for .mp3 and movies.
Re: (Score:3)
There ought to be an &1 after the 2>.
Re:Your eyes (Score:5, Informative)
The identify program is a member of the ImageMagick(1) suite of tools. It describes the format and characteristics of one or more image files. It also reports if an image is incomplete or corrupt.
Re: (Score:2)
So does zfs do checksumming of all files?
Yes. All filesystem blocks, I believe.
Re: (Score:2)
http://lmgtfy.com/?q=So+does+zfs+do+checksumming+of+all+files%3F+%20 [lmgtfy.com]
Re: (Score:2)
This reminds me of parity and ECC memory battles of decades past. OK, so it detects an error... Then what? Shut off the power? Not really sure what you'll be gaining. The sole example where it works is when you have the policy and budget to replace anything that takes an error. Useless for this situation.
Re: (Score:2)
Then what? Restore from last (good) backup, instead of propagating the corrupted file through the backup system until the good version is lost, surely?
Re: (Score:2)
Ok, forgetting that ECC also corrects random errors that happen on functional hardware... WTF? Of course detecting problems is only usefull if you have the 'policy' of correcting them somehow.
Re: (Score:2)
Most end users don't have that policy. Is it running right now? Well wait until it breaks completely and is no longer usable in any form.
Re: (Score:2)
Re: (Score:2)
That's a preemptive strategy, though. No help at all if you only think to use it after your kid brother decides it would be fun to slap his Magnet Balls all over your computer case.