Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Operating Systems Software

Is ext4 Stable For Production Systems? 289

dr_dracula writes "Earlier this year, the ext4 filesystem was accepted into the Linux kernel. Shortly thereafter, it was discovered that some applications, such as KDE, were at risk of losing files when used on top of ext4. This was diagnosed as a rift between the design of the ext4 filesystem and the design of applications running on top of ext4. The crux of the problem was that applications were relying on ext3-specific behavior for flushing data to disk, which ext4 was not following. Recent kernel releases include patches to address these issues. My questions to the early adopters of ext4 are about whether the patches have performed as expected. What is your overall feeling about ext4? Do you think is solid enough for most users to trust it with their data? Did you find any significant performance improvements compared to ext3? Is there any incentive to move to ext4, other than sheer curiosity?"
This discussion has been archived. No new comments can be posted.

Is ext4 Stable For Production Systems?

Comments Filter:
  • by buttfscking ( 1515709 ) on Saturday May 30, 2009 @12:45PM (#28150193)
    I moved to ext4 as soon as it became available. I haven't had any problems thusfar (no data loss, etc), and the increased speed is noticable. So - in the opinion of a very casual Linux user - I would say that yes, it's "okay." I'm not sure I'd trust it with anything super serious, though. I could be the only one without any problems, after all. As always, you should tip-toe around anything bleeding-edge.
  • Re:Wrong question (Score:5, Interesting)

    by QuoteMstr ( 55051 ) <dan.colascione@gmail.com> on Saturday May 30, 2009 @12:54PM (#28150261)

    Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.

    fsync-on-rename creates a much better environment for application developers and users alike. The Right Thing happens by default, and I maintain that nobody actually wants the unsafe rename behavior. Allowing an application "choice" in this respect is a red herring.

    The only improvement I'd make it to flush the file involves on every rename, not just renames that happen to overwrite an existing file. Under the current scheme, an application doing the write-close-rename to replace a file will still be put in a bind if the file to write doesn't exist yet. (i.e., you can still end up with a zero-length file where no such file ever existed on a running system)

  • ext4 is buggy (Score:4, Interesting)

    by hamanu ( 23005 ) on Saturday May 30, 2009 @12:58PM (#28150291) Homepage

    Well, the fsck times are really fast compared to ext3, and thank god, because EVERY time I reboot it requires an fsck, complaining about group descriptor checksums. Even if I unmount my ext4 filesystem and remount it without rebooting it gets all fscked up. I have a 3TB ext4 fs on LVM on RAID, that was NOT converted from ext3, but built on brand new drives. My similar ext3 filesystem has had so such problems.

    ext4 takes about 7 minutes to fsck, ext3 took hours. I hope they fix this soon.

  • by 3vi1 ( 544505 ) on Saturday May 30, 2009 @01:03PM (#28150325) Homepage Journal

    I was one of the people that spoke loudly when Ext4 caused 0-byte file corruption.

    While I don't entirely agree that it's just "an application issue", because apps that work fine on every other filesystem should not need to be re-written specifically for Ext4, I am pleased at the work the devs have done to work around the problems. The kernel patches have eradicated the issues I had with corruption, and the performance is still great.

    I never did official benchmarking to determine the extent, but my perception is that there's a noticeable performance increase when using Ext4 instead of Ext3.

    If I were building a production server, I may think twice and just go with Ext3... unless the app would *greatly* benefit from Ext4. However, for a desktop system, I think Ext4 is a very good choice and ready for primetime.

  • Re:Ye (Score:5, Interesting)

    by dov_0 ( 1438253 ) on Saturday May 30, 2009 @01:15PM (#28150409)
    I've been running ext4 for / , but left ext3 for /home where any KDE apps I run could fudge writes. No problems at all.
  • Re:Wrong question (Score:5, Interesting)

    by nwanua ( 70972 ) on Saturday May 30, 2009 @01:23PM (#28150457) Journal

    Wha....? Are you seriously suggesting that applications/utilities need to be patched to deal with faulty (yes, faulty) filesystem semantics? For _every_ single filesystem they might encounter? The whole point behind a filesystem layer is to present a unified view of files to the user layer regardless of physical media or driver quirks.

    The point is really that ext4 is/was broken, and IMO, any filesystem requiring patches to applications in order not to lose data is no filesystem at all. It's unbelievable (despite the technical benefits of ext4) that this would even be up for consideration.

  • Re:Wrong question (Score:5, Interesting)

    by icebike ( 68054 ) on Saturday May 30, 2009 @01:35PM (#28150531)

    Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.

    And had it been enforced, as soon as all developers went thru and added the fsync calls everywhere it would have become necessary for file system maintainers to no-op fsync calls in order to regain any approximation of prior performance.

    Flushing "one file" is not always sufficient. Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. And perhaps the higher level directory as well.

  • Re:Wrong question (Score:5, Interesting)

    by RiotingPacifist ( 1228016 ) on Saturday May 30, 2009 @01:46PM (#28150627)

    hmm i think most of them are but im still having problems with mv, seriosuly can we stop this bullshit, ext4 was clearly not working!
    If you cant rename a fucking file without risking total corruption of the file, at no point in renaming "settings-new" to "settings" should the file "settings" become unusable, What the fuck CAN kde4 do?

  • by Flammon ( 4726 ) on Saturday May 30, 2009 @02:25PM (#28150897) Journal

    ... because apps that work fine on every other filesystem should not need to be re-written specifically for Ext4

    Not quite. I believe XFS and JFS behave the same way as Ext4. Here's a good article and thread on the subject. http://lwn.net/Articles/322823/ [lwn.net]

  • by QuoteMstr ( 55051 ) <dan.colascione@gmail.com> on Saturday May 30, 2009 @02:30PM (#28150929)

    Not quite. I believe XFS and JFS behave the same way as Ext4.

    When XFS was first released, there was quite a buzz surrounding it before people realized they'd lose data. XFS, not ext3, would have been the the de-facto Linux standard had the developers not stubbornly refused to fix its dataloss bugs. By the time they finally got around to it (for some cases), there'd already been irreparable damage to XFS's reputation.

  • We had this problem (Score:5, Interesting)

    by xiox ( 66483 ) on Saturday May 30, 2009 @03:08PM (#28151211)

    Our 8TB raid system would get trashed after copying data onto it (group descriptor checksums on fsck). It looks like it was an ext4 bug. They fixed it about a week or two ago, here [spinics.net]. Maybe it will get in your kernel soon. I'm not going to start ext4 on any production system for at least 6 months I think now.

  • Re:ext4 is buggy (Score:3, Interesting)

    by Junta ( 36770 ) on Saturday May 30, 2009 @03:25PM (#28151363)

    I too had a 2TB RAID volume with ext4. I suffered the same situation. I continue to complain myself even though I have reformatted as ext3 and solved my problems, so that others will hear my issue and learn.

    And before you claim my underlying IO must be flawed, a large part of my job is storage subsystem validation and I'm quite used to isolating which layer is inducing problems from storage controller hardware, drivers, or higher-layer os layers, and every thing I did, every test I ran, pointed to ext4 as the culprit in this case.

  • Disturbing (Score:3, Interesting)

    by QuoteMstr ( 55051 ) <dan.colascione@gmail.com> on Saturday May 30, 2009 @03:31PM (#28151441)

    Disturbingly enough, rename under OS X's HFS+ filesystem doesn't appear to be atomic [weirdnet.nl] even on a running system. If they can't get rename right on a running system, I'd hate to see what kind of scrambled mess the filesystem is after a crash.

  • by Jurily ( 900488 ) <jurily&gmail,com> on Saturday May 30, 2009 @03:43PM (#28151565)

    If your Linux box is crashing that often and you have no backups, the only person you have to blame is yourself. If something is that mission critical you should be using a more stable branch for one and backups should alleviate the potential for data loss if it occurs (including an FS that is either tested with known good apps that aren't exposed to this, or by using a different OS that doesn't see this issue). Crashes should be very few and far between in any case.

    And there we have the problem with the Linux community, boys and girls. Ext4 is not behaving like the rest of the filesystems? It's your fault, dear user.

    The files in question are not mission-critical, like Firefox and KDE config files. But they are annoying when they go poof. The crashes I experience come from me applying the power button because the reboot process is waaay too slow for my liking. And I haven't had a single issue with that since Red Hat 7.3. And now you tell me it's my fault I've come to rely on a feature that was there for 10 fucking years? In fact, the very feature that converted me to Linux?

    Do you think I give a fuck what's in the specs? The illusion of safety is now gone, and there is nothing you can say to make up for it. Telling me it's my fault does not help, either.

    In terms of "data loss upon the unexpected", ext4 ranks right there with Windows 95. Now you can turn off your computer.

  • by Ed Avis ( 5917 ) <ed@membled.com> on Saturday May 30, 2009 @03:55PM (#28151685) Homepage

    The point is that you have expressed all sorts of fear about ext4 - oh no, I'm not letting it near my production boxes - but you have not applied the same standard to the applications that trashed their config files when run on ext4. Even though, strictly speaking, it is the applications that are buggy. You should be equally enthusiastic about getting rid of KDE and any other software that trashes configuration files; otherwise it looks like you are playing favourites and blaming ext4 in order to overlook the bugs in the apps you're attached to.

  • by CarpetShark ( 865376 ) on Saturday May 30, 2009 @04:18PM (#28151919)

    I tried ext4 as soon as it hit 2.28. I never ran into the KDE bugs, but I did notice it complaining that the filesystem was full despite many GB being free (and we're not talking about the relatively small amount reserved for root here).

    It certainly wasn't fit to be renamed from ext4dev at that stage.

  • Re:Wrong question (Score:4, Interesting)

    by Rich0 ( 548339 ) on Saturday May 30, 2009 @04:44PM (#28152157) Homepage

    Define bug.

    Here is the issue - application wants to make an atomic change to a file. The application doesn't care if the file ends up in the starting state, or the final state - only that the change is atomic.

    fsync doesn't do that. Fsync guarantees that the file ends up in the final state quickly (but not atomically). Fsync also degrades system performance.

    So, the proposed application change doesn't accomplish what the app writers actually want, and it slows down the system. It does reduce the risk of data loss.

    What we really need is transaction support for files - just like we have for databases. Now, I agree that this may not be needed for all file operations (though admins should be able to turn it on by default if they want), but this is really the "right way" of handling this sort of situation.

    If anything I find myself patching apps to remove fsyncs. MythTV forces frequent fsyncs of the video stream and it can kill performance and even lead to data loss (buffer overruns - the degraded disk performance can't keep up with recorded video demand). There is no reason a recording needs to be fsynced every 30 seconds. If power goes out I'm going to lose 5 minutes of my recorded show anyway while the system comes back up - losing the previous 30 seconds of unflushed video isn't the end of the world. I'd rather have that then have dropped frames and glitches all over the place from lost video packets.

    What we need is for apps to tell the OS what they actually need, and for the OS to figure out how to deliver it. App writers shouldn't care what filesystem you're writing to and what the approved way of modifying files on that filesystem is. They certainly shouldn't care about how the write cache works. Sure, there should be an fsync option, but it should be used to sync disk writes to operations that take place in other media or over the network (such as in a transactional database). There should also be other options like atomic file operatiopns (make the following changes to the following files atomically). Let the app figure out what its requirements are, and let the OS figure out how to deliver it.

"Gravitation cannot be held responsible for people falling in love." -- Albert Einstein

Working...