Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage OS X Hardware Linux

Best Format For OS X and Linux HDD? 253

dogmatixpsych writes "I work in a neuroimaging laboratory. We mainly use OS X but we have computers running Linux and we have colleagues using Linux. Some of the work we do with Magnetic Resonance Images produces files that are upwards of 80GB. Due to HIPAA constraints, IT differences between departments, and the size of files we create, storage on local and portable media is the best option for transporting images between laboratories. What disk file system do Slashdot readers recommend for our external HDDs so that we can readily read and write to them using OS X and Linux? My default is to use HFS+ without journaling but I'm looking to see if there are better suggestions that are reliable, fast, and allow read/write access in OS X and Linux."
This discussion has been archived. No new comments can be posted.

Best Format For OS X and Linux HDD?

Comments Filter:
  • UFS. (Score:2, Informative)

    by necroplasm ( 1804790 )

    UFS would be the best option. Linux supports it with -rw since Kernel 2.6.30 (afaik) and OS X mounts UFS natively.

    • Re:UFS. (Score:5, Informative)

      by clang_jangle ( 975789 ) on Thursday July 01, 2010 @05:09PM (#32763920) Journal

      UFS would be the best option.

      Unless you're using Tiger or earlier, UFS is not an option. The last two versions do not support UFS at all. However, HFS+ support in Linux is pretty good. Otherwise you're looking at mac-fuse for ext2/3, which IME is pretty slow and buggy. I thinks Jobs has gone out of his way to make OS X incompatible with OSes other than windows. Maybe he's afraid of what will happen if everyone becomes aware they have other choices.

    • 4GB per file limit (Score:5, Insightful)

      by Ilgaz ( 86384 ) on Thursday July 01, 2010 @05:32PM (#32764298) Homepage

      OS X UFS has a very unfortunate limit as it doesn't support files over 4 GB. Or, there was no chance, I would format everything (especially USB) as UFS.

      Lack of commercial quality disk tools like Disk Warrior if a true catastrophe happens is a problem too. Of course, fsck can do good things but after a true catastrophic filesystem issue, diskwarrior is a must. That was one of the things Professional Mac community had hard time explaining ZFS community.

      As Apple was truly wise to completely document it down to a point you can even write a full feature defragmenter (iDefrag), HFS+ without journaling seems to be the best option. I am in video business and I have seen it deal with files way beyond 80GB without any issues. In fact, lots of OS X users who images their drives see it everyday too.

      I don't know why journaling is not implemented, it is open and documented too. If a bit hassle happens, it sure deserves it since he deals with external drives which are just fit to journaling purposes.

  • Followup question... (Score:3, Informative)

    by serviscope_minor ( 664417 ) on Thursday July 01, 2010 @04:56PM (#32763660) Journal

    I have a similar problem, albeit on a smaller scale. I use unjournalled HFS+.

    However, the problem is that HFS+ being a proper unix filesystem remembers UIDs and GIDs which are usually inappropriate when the disk is moved.

    Is there any good way to get Linux to mount the filesystem and give every file the same UID and GID, like for non unix filesystems?

  • HIPAA Constraints? (Score:5, Interesting)

    by fm6 ( 162816 ) on Thursday July 01, 2010 @04:58PM (#32763702) Homepage Journal

    By "HIPAA Constraints" I assume you mean the privacy rule. I would think that this rule would prevent you from using sneakernet to transmit files. Unless you're encrypting your portable disks, and somehow it doesn't sound like you are.

    Fun reading:

    http://www.computerworld.com/s/article/9141172/Health_Net_says_1.5M_medical_records_lost_in_data_breach [computerworld.com]

    • That was my first thought as well. And as much as I hate to say it, but Fat32 might be the best option. Either that or UFS.

      • Most of files they produce involves an actual patient, sometimes in critical condition stay in something like a grave for hour sometimes.

        If one of issues with filesystem, that archaic junk which should have never been released happens, it will be nightmare to restore the data while it is easy on HFS+ Journaled or even NTFS.

        I own a Symbian phone and trust me on that, if there was a $50 utility just to get rid of FAT32(!) junk risking my data on memory card, I would happily buy it.

      • FAT32 has a 4GB file size limit. We have 80GB+ files (and could, if we wanted, have 250GB files but RAM becomes a limiting factor).
    • by eschasi ( 252157 ) on Thursday July 01, 2010 @05:37PM (#32764384)
      HIPPA mandates who can and should have access to the files. The method of storage (disk, tape, SSD, paper, whatever) is largely irrelevant. As long as all those who have access to the files are HIPPA-trained and following the appropriate procedures, everything is fine. Similarly, transport is relevant only in that there must be no data disclosure to unauthorized persons. As such, if a person with appropriate clearance does the transport, all is cool.

      HIPPA data is often encrypted when placed on tape or transported across systems, but that's because such activities may involve the data being visible to unauthorized people. As examples of each:

      • If two physically separate sites exchange HIPPA data across the open Internet, the data must be encrypted during transport. This might be done by VPN, sftp, whatever. As long as the bits on the wire can't be read by the ISPs managing the connection, it's OK.
      • For tapes that you archive off-site, you don't want your external storage facility to be able to read the tapes, nor have the data readable if the tape is misplaced in transport.

      IMHO wise use of sensitive data on laptops requires encryption at the filesystem level. It's neither difficult or time-consuming, but given how much sensitive data has been exposed via folks losing or misusing laptops, it ought to be a no-brainer. Sadly, too few places bother.

      • by fm6 ( 162816 )

        HIPPA mandates who can and should have access to the files. The method of storage (disk, tape, SSD, paper, whatever) is largely irrelevant.

        Say what? You've never hear of a data breaches from lost or stolen portable hardware? See the link in the post you replied to.

    • Re: (Score:3, Interesting)

      by rwa2 ( 4391 ) *

      Maybe instead of using a portable disk, they could whip up a nettop running Linux and transfer files over the gigabit ethernet...
      Then they could do transfers via samba or rsync+ssh , and the nettop could transparently take care of encrypting the underlying FS, whatever that may be.

      Performance wouldn't be great... maybe 20MB/s instead of 60MB/s for an eSATA drive, and they'd have to work out a consistent network port / IP across all the sites it travels to. But it might confer some advantages.

      Along similar

      • We would if it was easy but alas we have to deal with multiple IT departments in order to do that plus a lot of other red tape. The biggest issue is the file sizes (we're on gigabit but our colleagues are not), otherwise we would not use portable drives (well, money is a factor too; government grants only provide a limited amount of money).
        • Comment removed based on user account deletion
        • I completely understand the red tape.

          Our scientists have been having similar problems. I believe that the real solution here is to stop these guys from working on their local machines with the full sized datasets. We've provided a centralised HPC system that is connected via infiniband (and others) to multiple architectures of storage.

          There is the standard /home which is DMF'ed with the top tier being 50T of total 650MB/s write (not sure of the read stat - I'm the software guy not the hardware guy). This

          • We would love to have that infrastructure at our institution; we're working that way but we don't have it yet. If our lab could afford it we'd do something like that but our funding isn't that high. The other issue is that we use Macs because they are the best for neuroimaging work (Linux is great but things work better in OS X) and our IT department is reluctant to support them so we're doing most of the support for them ourselves. So for now we fly by the seat of our pants a bit.
            • In case you need help convincing the hierarchy and you need a little ammunition to get a decent, scalable, centralised solution, you will find allies in:
              Engineering - find out those that teach and apply for grants doing any kind of FEA work, the robot people,
              Physics: The medical imagers, users of geant4 and beam, biomechanical,
              Comp Sci: talk to anyone related to the document searching/indexing areas, machine learning, etc
              Chemistry: Search you local paper repository for those that have someone from your math

        • Re: (Score:3, Interesting)

          by rwa2 ( 4391 ) *

          Yeah, then it sounds like you're pretty much doing the best you can under the circumstances... I was just trying to think out of the box a bit and turn your filesystem compatibility problem into a file server compatibility problem, since cross-platform compatibility is a much bigger deal in the latter scenario.

          One last consideration you might want to try benchmarking is storing your data in an image file, like a zip or tgz or more likely a dmg archive... that way you could probably do transparent compressio

    • By "HIPAA Constraints" I assume you mean the privacy rule. I would think that this rule would prevent you from using sneakernet to transmit files. Unless you're encrypting your portable disks, and somehow it doesn't sound like you are.

      Fun reading:

      http://www.computerworld.com/s/article/9141172/Health_Net_says_1.5M_medical_records_lost_in_data_breach [computerworld.com]

      You would be surprised at how outdated parts of HIPAA are (from the day they were written). And what things they fail to cover. Heck, there are sections that indicate the requirement for data encryption for certain uses/storage/etc, but that's about the extent. ANY encryption will do to pass muster. A simple subsitution key would pass the required criteria. Then there are sections that are very specific in specifying methods that are useless... while others at least seem to have been thought out. There are

    • Re: (Score:3, Interesting)

      All of our scans are natively anonymized: we make up the birthdate and we never include the research participants' names. Our images are high enough quality that you can do nice 3D reconstructions of people's heads (and faces) but there is virtually no chance that anyone would recognize the faces unless they knew a person really well (even then it is hard). We have checked with the privacy office and our drives do not have to be encrypted because the images that will be on these drives are de-identified. We
      • by fm6 ( 162816 )

        The biggest issue comes in dealing with multiple IT departments and setting up network access to our materials. Plus our images are so large that for these processed files (not the originals) we are opting for local storage instead of storage managed by our IT staff (who are wonderful but not cheap; we just purchased 4TB of local storage for 1/4 the cost of 1TB from IT).

        Dude, there's a reason network storage is more expensive than local storage: it comes with the infrastructure that allows lots of people to access it. If you try to serve up these large files from your local network, you'll slashdot the thing, and wackiness will ensue.

        Getting back to the privacy issue: I hope your privacy officer did due diligence, and isn't some overworked functionary who just said, "The data is anonymized? Well, that's OK then." You wouldn't be the first people to distribute data they tho

        • There is no way for names to be connected with the data without someone hacking our managed filesystem (it's possible, but that is IT's responsibility, not ours) or having direct access to our lab space (but if we are not there they would have to get through multiple locked doors and break into a locked cabinet).

          We opted to go with local, portable storage because only 4 people need or have access to these particular image files on three computers (we have 2 more collaborators that might need access but we
    • by drfreak ( 303147 )

      I use TrueCrypt to transport patient data to/from doctor's offices.

  • NTFS (Score:4, Interesting)

    by Trevelyan ( 535381 ) on Thursday July 01, 2010 @05:07PM (#32763886)
    NTFS or any other FUSE (MacFUSE [google.com]) file system. However in a heterogeneous environment NTFS has the bonus of native Windows support.

    There is NTFS-3G for Linux and Mac OS X [sourceforge.net]

    There is also an EXT2 Fuse FS (for Mac OS), and probably many other options.

    Having said that, I have never had a problem with Linux's HFS+ write support.
    • Re:NTFS (Score:4, Funny)

      by X0563511 ( 793323 ) on Thursday July 01, 2010 @05:24PM (#32764156) Homepage Journal

      Windows doesn't play in here, it's OSX and Linux. Tossing NTFS into that would just be... wrong somehow.

      • Re:NTFS (Score:4, Informative)

        by RobertM1968 ( 951074 ) on Thursday July 01, 2010 @07:23PM (#32766140) Homepage Journal

        Windows doesn't play in here, it's OSX and Linux. Tossing NTFS into that would just be... wrong somehow.

        Flamebait mod or not, there is a valid point. Though various NTFS drivers do allow read/write, the success isn't graven in stone. There are better alternatives in the Linux/OSX world. Keep in mind that losing this data becomes either costly (as in time=money, let's go make another set of copies to run to whatever office) or very bad (as in someone moved the files to the external instead of copying them) or both.

        So, as good as the NTFS R/W drivers are getting, it's safer to use a file system that is known to be more stable and less error prone, such as HFS+ or UFS or one of the other suggestions. "Really good" shouldnt be an option in the medical world when "even better than 'really good'" is available, compatible, and easy to install on all systems involved.

  • Mac OS and Linux both have support for NTFS through NTFS-3G [tuxera.com]. Mac OS has support for ext2 through fuse-ext2 [sf.net].

    • by EXrider ( 756168 )
      Write performance through FUSE on Mac OS X is pretty disappointing, several orders of magnitude slower than direct filesystem access in my experience. Transferring an 80GB file to a FUSE mounted filesystem would be painful.
    • by MachineShedFred ( 621896 ) on Thursday July 01, 2010 @06:04PM (#32764890) Journal

      If it's Mac OS X 10.6.x, you don't even n eed NTFS-3G, as the native NTFS driver has read / write capability. You just need to change the /etc/fstab entry for the volume to rw, and remount.

      • by Anonymous Coward on Thursday July 01, 2010 @07:09PM (#32765934)

        This is dangerous advice. There are numerous reports of instability and NTFS volume corruption when forcing 10.6 to mount NTFS volumes R/W. Apple seems to have turned NTFS write off by default for a good reason, it's not done yet.

  • I have a similar scenario and I think HFS+ unjournaled is best for your scenario. FAT32 is even worse. You are fortunate not to have to support windows. Ideally I would use NFS and file sharing instead of external disks. But shipping a disk is always better than transferring large amounts of data over the net.

    Another option is to install MacFUSE [google.com] and then mount other file systems. This is what I do when NTFS is required. For my Linux system I love ext4, if you need an older file system use XFS, ext3 is stabl

  • It sucks, but NTFS might just be the best option. OSX and linux both have had stable enough support for years. The main plusses over FAT32 are journaling and support for files > 4GB. Using UFS is dangerous (or at least has been until very recently) because there are so many different variants of it (solaris, BSD, osx, etc.) that linux support is notoriously troublesome. An extra plus of NTFS is you can use it easily on windows machines as well.

  • Reiser? (Score:5, Funny)

    by Wowsers ( 1151731 ) on Thursday July 01, 2010 @05:11PM (#32763954) Journal

    I would have recommended ReiserFS, but the data might get buried somewhere and the system would not remember where it was....

  • No Filesystem (Score:5, Informative)

    by Rantastic ( 583764 ) on Thursday July 01, 2010 @05:14PM (#32763988) Journal
    If you are only moving files from one system to another, and do not need to edit them on the portable drives, skip the filesystem and just use tar. Tar will happily write to and read from raw block devices... In fact, that is exactly what it was designed to do. A side benefit of this approach is that you won't lose any drive capacity to filesystem overhead.
  • Rubbish (Score:5, Informative)

    by Improv ( 2467 ) <pgunn01@gmail.com> on Thursday July 01, 2010 @05:17PM (#32764042) Homepage Journal

    You're storing it in the wrong format - there are all sorts of tools to convert to Analyse or DICOM format, which give you a managable frame-by-frame set of images rather than one huge one. Most tools to manipulate MRI data expect DICOM or Analyse anyhow (BrainVoyager, NISTools, etc).

    If you really want to keep it all safe, use tarfiles to hold structured data, although if you do that you've made it big again.

    Removable media are a daft long-term storage - use ad-hoc removable media solutions (or more ideally, scp) to move the data.

    • Our large files are not DICOM or ANALYZE or NIfTI files. All of those are small; if that's all we dealt with we would not have this issue. Our large files are fiber tracking files that are a particular format for a particular visualization and analyzation program. In any case, the only files going on these drives are easily re-creatable should the drives explode or something like that (it would take time but we only want to put end-result files and keep all the other files that are used to create them on ou
    • by sn00ker ( 172521 )
      Oi. You've been around long enough to know the rules. Knowledgeable, informed posts are contrary to the T&C. Go back to your porch. Sheesh. It's geezers like you that give whippersnappers like me a bad name. I'll be getting off your lawn now.
  • No, seriously, who cares? This is a process designed to save files that are then transferred through SneakerNet. While moderately large, at 80gb, they're not huge by modern standards. If you have a current solution that works, stick with it.

    If, however, there are other constraints that are affecting you - transfer speed, decades-long retention on local media, security, etc, then by all means let us know. Until then, to use the obligatory car analogy, its as if you've said:

    Due to the distance between my house and work, I currently use an automobile to go between the two locations and to perform various other services. Currently I use a Honda Accord. What would you suggest?

  • NFS over SSH (Score:3, Interesting)

    by HockeyPuck ( 141947 ) on Thursday July 01, 2010 @05:26PM (#32764186)

    Just tunnel NFS over SSH. I can't imagine how secure it would be to sneakernet any files around the office. If you need to encrypt the data at rest then either encrypt on the client or leverage an encrypted filesystem of a Decru type appliance.

  • Network? (Score:5, Informative)

    by guruevi ( 827432 ) on Thursday July 01, 2010 @05:42PM (#32764486)

    Really, you need a gigabit network and transfer files over it using AFP and/or NFS and/or SMB. First of all HIPAA requires you to encrypt your hard drives which most researchers won't do (it's too difficult). Then you also got the problem what happens if the researchers (or somebody else) leaves with the data.

    Solaris and by extension Nexenta have really good solutions for this. You can DIY a 40TB RAIDZ2 system for well under $18,000. If you use desktop SATA drives (which I wouldn't recommend but ZFS keeps it safe) for your data you can press that cost to $10 or $12k.

    I work in the same environment as you (neuroimaging, large datasets), feel free to contact me privately for more info.

    • We've talked with our privacy office and our files do not have to be encrypted because they are deidentified from the start. The privacy office of course still prefers that they are encrypted (which we will do) but in our case with our scans (the ones going on these drives are impossible to identify someone with and do not strictly count as personal health information). Everything is kept secure and we will encrypt just to be super safe.
  • RAID array and NFS, a Lustre, etc depending on need, but a network share! ...and if you need more encryption and even admins cant have access to data, have your users store true-crypt drives on the network. Sneakernet is, in the end, far more insecure!
  • UDF (Score:2, Informative)

    I'm using a USB Disk formatted under linux with UDF (yep, it's not limited to DVDs, there is a profile for hard disks). It can be used without problems under OSX (even Snow Leopard)
     

    • by marquise2000 ( 235932 ) on Thursday July 01, 2010 @07:07PM (#32765902) Homepage

      Ok everybody's occupied with surreal suggestions, but anyway:
      *UDF* is quite awesome as a on disk format for LinuxOSX data exchange, because it has a file size limit around 128TB, supports all the posix permissions, hard and soft links and whatnots. There is a nice whitepaper summing it all up:
      http://www.13thmonkey.org/documentation/UDF/UDF_whitepaper.pdf

      If you want to use UDF on a hard disk, prepare it under linux:
      1) Install uddftools
      2) wipe the first few blocks of the hard disk, i.e. dd if=/dev/zero of=/dev/sdb bs=1k count=100
      3) create the file system : mkudffs --media-type=hd --utf8 /dev/sdb (that's right, UDF takes the whole disk, now partitions)

      If you plug this into OSX, the drive will show up as "LinuxUDF". I am using this setup for years to move data between linux and OSX machines.

      • by fnj ( 64210 )

        Give the man a cigar. I was struggling through all the other suggestions, every single one of them involving unacceptably horrible tradeoffs, and finally get to this post, the only idea that is not just mind numbingly brain dead. I don't even use OSX any more (finally cured that brain disease), and I'm gonna check this out.

      • Re: (Score:3, Insightful)

        by mlts ( 1038732 ) *

        That is an excellent solution, and arguably the best to the OP's problem printed. UDF works on Windows, OS X, Linux. Even AIX is happy with it and can write to it. So an external drive with this on it should definitely solve the problem.

  • NAS device (Score:3, Insightful)

    by linebackn ( 131821 ) on Thursday July 01, 2010 @06:46PM (#32765590)

    A simple NAS enclosure or NAS device might be what you are looking for. You can get a single drive NAS enclosure, and add a drive, that you can carry around just like a regular portable drive. You can move it between networks and use any connection method the NAS device happens to implement (SMB, FTP, NFS, etc). Some even let you optionally connect it directly via USB or eSATA to access the file system directly, and some may have encryption or other security features as well.

    Of course, check to make sure you have permission and that connecting things to your network does not violate any policies. If connecting a network device directly to the your network is not permitted then perhaps you can add a second, dedicated, network card to the computers.

  • Treat the disk as if it were a tape, and use the GNU version of cpio.

    You can install GNU cpio via macports on your Macs, and people with Linux should find it either already installed or available in their distribution's package system.

    You need to use the GNU cpio instead of the BSD cpio that ships with OS X because there are incompatibilities between the two, and I was unable to find a set of settings that would make them compatible. (There are settings that should, but they did not work, so there's a bug i

  • I do my fair share of transferring large neuroimaging datasets around from time to time, although I don't do it regularly. If you want to use hard drives that aren't connected to anything in transit, then I have to agree with whoever suggested doing it without a filesystem. I've always found that to be the easiest way to get around filesystem (and sometimes operating system) idiosyncrasies, whether you're writing to a DVD or a hard drive or whatever. If you can (de)serialize your data easily (using tar),

  • I have a similar problem with backups in my paper less medical practice - I always need a working system off-site for emergency replacement, and here in rural Australia doing it via Internet is impossible due to lack of networking infrastructure and ridiculous bandwidth costs
    I use a QNAP NAS (TS659). They also come as tiny handy cubes with 2.5" disks instead of the 3.5"
    That makes the question of the file system irrelevant, since it communicates with just about any operating system through standard protocols

  • The best cross-platform (Linux+MacOS) filesystem is NFS, wh-- stop hitting me, I DID read the whole question. Ok? So, as I was about to say, use NFS. When the techno-ignorant HIPAA people watch what you're doing, just send 80gig of /dev/random (bonus: it looks encrypted, the HIPAA guys will love that) to the removable drive, and when you're coping off that drive, send to /dev/null. Meanwhile, as the drive's contents are going to your lame software-emulated null device, also be reading the file off the e

  • Just curious. Is the 80GB after applying lossless compression to the image set? If not, there's no good reason to store it uncompressed.

    As for your question, I agree with those that say to skip the filesystem. Just use tar and a block device.

    -Randall

  • by DynaSoar ( 714234 ) on Friday July 02, 2010 @01:01AM (#32768980) Journal

    We had almost exactly the same problem. Our fMRI work was done at University of Virginia on a Linux machine. naturally you don't want to tie up a $1500/hour data collection machine doing analysis. Our data was transferred immediately to the Neurological Institute to a multiboot machine. No patient data included at this point, so no HIPPA problems. The receiving box ran Linux initially since the analysis programs from NIH (primarily AFNI) were Linux based. Patient data got added here so HIPPA became an issue. The machine had multiple hard drive bays, all of which were removable, plug-and-play drives made from a kit that provided slide-in rails and a locking mechanism, otherwise were common, commercial drives. Externals would have been easier, but the guy who devised this had a rilly rilly good reason. I remember it was good, but not what it was. Anyway, the machine could boot other OSs, prep the drives, go back to the native Linux HFS+ and transfer/translate to the , it was transferred, the drive removed, packaged, and FedEx'd to the other analysis sites at Virginia Tech, NIH, and U.Va Wise. We were strictly experimental, no direct medical treatment, and so time was not an issue. With OS X being *nix, there's not a lot of reasons to go with one over the other except for convenience when it comes to what your data collection and analysis are running under. Unless yours run fine under OS X, I'd say stick with HFS+, and of course moderate that according to whether you have to share out the data and what those people are running. I wouldn't bother with supporting Windows, as they continually find new problems to have with large files. One comparison test showed no difference in analysis results, but they did have problems with Windows choking on the data files. Their test files were only 1.5 GB. ref: J Med Dent Sci. 2004 Sep;51(3):147-54. Comparison of fMRI data analysis by SPM99 on different operating systems. PMID: 15597820. My experience agreed with their results. As I said we had little call for Macs, so we didn't run enough of that to give a good test of whether it had the same kind of problems. Bottom line, we used what we needed to according to where it was going and what they needed it to be, but for our own use it made no sense to transfer it out of the OS that collection and analysis used, HFS. The system met with the approval of the biophysicist we worked with at U.Va, and he had been a grad student under Peter Fox when the latter developed SPM. OH YEAH: the good reason. If anyone else wanted to work with us, they didn't have to dig too deeply into techie stuff either hardware or software. We could send them a removable-drive kit to install, and send them a drive with bootable Linux, AFNI and data, all plug and play. If that might be useful to you (using externals instead of removables doesn't matter here) that's probably be another vote for HFS.

To the systems programmer, users and applications serve only to provide a test load.

Working...