Please create an account to participate in the Slashdot moderation system


Forgot your password?
Learn to Build 14 Websites with 28 Hours of Instruction on HTML, JavaScript, MySQL & More for $14 ×
Data Storage Linux

Software SSD Cache Implementation For Linux? 297

Annirak writes "With the bottom dropping out of the magnetic disk market and SSD prices still over $3/GB, I want to know if there is a way to to get the best of both worlds. Ideally, a caching algorithm would store frequently used sectors, or sectors used during boot or application launches (hot sectors), to the SSD. Adaptec has a firmware implementation of this concept, called MaxIQ, but this is only for use on their RAID controllers and only works with their special, even more expensive, SSD. Silverstone recently released a device which does this for a single disk, but it is limited: it caches the first part of the magnetic disk, up to the size of the SSD, rather than caching frequently used sectors. The FS-Cache implementation in recent Linux kernels seems to be primarily intended for use in NFS and AFS, without much provision for speeding up local filesystems. Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"
This discussion has been archived. No new comments can be posted.

Software SSD Cache Implementation For Linux?

Comments Filter:
  • Linux caches data from any disks all the same, SSD or not.

    • Linux caches disk data in memory. The author wants to cache disk data in an SSD.

      • by Korin43 ( 881732 ) *
        Would using an SSD as a swap device have the effect they want?
        • Re: (Score:2, Informative)

          by Unit3 ( 10444 )

          No. Swap is not a cache. Swap holds things that don't fit in RAM. I/O cache will never hit swap, it limits itself to physical RAM.

        • Re: (Score:2, Offtopic)

          by Penguinisto ( 415985 )

          ...only if you want to blow out the SSD wear-limits.

          What the author wants (I believe) is to have Linux figure down which sectors are read most frequently, and have those mapped/linked/whatever to the SSD for speed reasons.

          If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.

          • Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?
            • Probably not. That would be bad if for example you wanted to burn a DVD and the burner program put a lot of stuff in /tmp. I'm not a linux pro or anything so I don't know how different distros do it but I don't think that's the default.

              • by raynet ( 51803 )

                Well, if you use tmpfs and not ramdisk for /tmp, then pages will be swapped to disk if needed, thus you can burn you DVD as long as you have enough swap available and damons like swapd or swapspace allow you to have reasonable size swap partition and then will create swapfiles by demand.

            • by rwa2 ( 4391 ) *

              Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?

              Depending on the distribution, but sometimes.

              On servers /tmp can get pretty big with random crap, though, so generally you want to be able to put it on a disk or allow it to swap out and use your RAM for something more useful.

              But on thin clients, netbooks, etc. without too much going on it might be better to put it on tmpfs to reduce SSD wear.

          • by dissy ( 172727 )

            If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.

            Unfortunately, it usually works out that some of the most volatile places on disk (/home /tmp swap) are the very places one would see the best result in speeding up.

            Also unfortunately those are the worst uses currently for a SSD

            Then again, for anyone who really wants to speed up things like swap and /tmp, the best way is to simply quadruple your ram and get rid of swap, and use tmpfs in ram for /tmp.

            The usual reason for not doing that is ram is expensive, and on top of that motherboards to handle a ton of r

            • Correct me if I'm wrong, but from the sounds of it he wants Readyboost [] for Linux, which I can't say as I blame him as I have Windows 7 and Readyboost is nice. Now I don't know how well it works, since I am not a Linux guy, but Lifehacker has a DIY Readyboost for Linux []. And for those still on XP there is a Readyboost for XP [] but it costs $40.

              Anyway from TFA it sounds like he wants Linux Readyboost. If he tries it he should probably come back here and give us a little review of how it went. After all if you

          • Because all of /bin is hardly going to be your most used stuff, and there's probably a ton of stuff frequently used that isn't in /bin, /usr.

            Sure, you can try and mount your most used stuff on SSD, but that's (a) a pain in the neck to fiddle around with (b) something ideally better left to an algorithm. (c) doesn't actually work that well, since you have to divide all your most used stuff into separate file systems.

        • Re: (Score:2, Informative)

          by Jezza ( 39441 )

          Assuming the SSD was faster at both read and write - it should speed things up. Hell just moving the swap onto a different physical disk helps. But don't. SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail. If you think about what's happening when the system is swapping - that's exactly what's going on. So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM,

          • Re: (Score:2, Informative)

            by Anonymous Coward

            SSD wear with writing, so if you constantly write to the same "sectors" they will fail.

            2006 called, they want their FUD back. While it's true that erase blocks in flash memory wear out with use, the whole battle between SSD manufacturers for the last couple years has been in mapping algorithms that ensure you don't hit the same erase block very often. By now, SSDs have longer lifetimes than HDDs. Of course that applies to real SSDs, not makeshift IDE-to-CompactFlash adapters.

          • Re: (Score:2, Informative)

            by EvanED ( 569694 )

            So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM, so the system won't need to swap (with enough RAM you don't need swap at all).

            A RAM buffer cache and SSD cache address far different issues. The buffer cache is far faster when it hits, but the SSD cache is far larger. It's pretty easy to find workloads where getting enough RAM so that your working set will fit into your buffer cache (alongside the memory use of whatever you're doing) would be more ex

            • by Jezza ( 39441 )

              Sure, adding RAM isn't a panacea, but running "san swap" can really speed (some) things greatly. The question I was addressing was all about swap - nothing else.

              I do think running swap in SSD is **probably** a bad idea, especially if you can put enough RAM in to not need swap. But sure that is a pretty glib statement...

          • by gfody ( 514448 )
            Why are people hung up on SSD lifespan? Unless you're talking about USB flash thumbdrives any SSD you buy is not going to "wear out" any time soon - 5 years is the absolute worst case scenario assuming you write constantly as much as the drive will take []. Tracking against intel's media wearout indicator suggests even heavily used drives will last around 15 years. How much computer equipment do you have around that's even 5 years old?
            • by Jezza ( 39441 )

              I'm not going to answer that - but to give you an idea I do have a NeXT Dimension ...

              But I take your point.

          • by AHuxley ( 892839 )
            "SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail."
            The early versions did, but now you have real developers with real support entering consumer space.
            eg. []
            10,000,000 Mean Time Before Failure (MTBF) and a 5 year warranty.
    • This is about doing double caching: cache to fast but limited RAM (L1) first and then have a much larger but slower cache, that being the SSD (L2). Difference being with other caching systems that the SSD of course holds state if power is down (so often use sectors may never be written do disk).

      • Re:I don't get it (Score:5, Insightful)

        by Colin Smith ( 2679 ) on Thursday April 22, 2010 @05:58PM (#31946506)


        CPU L1
        CPU L2
        CPU L3

        I estimate SSDs would be closer to Level 5 cache.


        • by Znork ( 31774 )

          I'd argue it's better to implement as HSM (Hierarchial Storage Management), with least recently used things getting delegated to more archival storage. It would be nice with a device-mapper-hsm layer that would let you simply stack one device upon the other and obtain the best distribution of desireable characteristics they could offer.

          IIRC, there was an intern at IBM who did a project like that some years ago, but I don't think much became of it.

        • by MobyDisk ( 75490 )

          If you want to play Mr. Pedantic, you skipped registers. And CPU cache may not necessarily cache disk data, so those don't count for the same reason registers don't count. Networks don't cache the internet. And don't forget newspapers - the Internet caches those. And newspapers cache events, which cache time. :-)

          The point is that most of the layers you listed are implementation dependent or not relevant to the discussion. For this purpose, the CPU is a black box - it could have different

        • OK, but CPU L1/2/3 is a data cache. Of course it will help but it's just not configured as a disk cache. Multi-processor systems for instance would not benefit from the CPU caches.

          Furthermore, I don't know about you, but my disk is certainly not used for network or internet cache. For network resources there is no such thing as configuring life-time and the browser disk cache is the first thing I disable (using a spinning system disk for internet cache is stupidity IMHO).

          Besides all that, it's just an examp

      • by Jezza ( 39441 )

        Yeah, especially if those sectors don't change much - an SSD isn't suitable for data that's rapidly changing.

    • Re:I don't get it (Score:5, Informative)

      by Anonymous Coward on Thursday April 22, 2010 @05:55PM (#31946466)

      The idea is to use the SSD as a second-level disk cache. So instead of simply discarding cached data under memory pressure, it's written to the SSD. It's still way slower than RAM, but it's got much better random-access performance characteristics than spinning rust and it's large compared to RAM.

      As for how to do it in Linux, I'm not aware of a way. If you are open to the possibility of using other operating systems, this functionality is part of OpenSolaris (google for "zfs l2arc" for more information).

      Cache Devices
                Devices can be added to a storage pool as "cache devices."
                These devices provide an additional layer of caching between
                main memory and disk. For read-heavy workloads, where the
                working set size is much larger than what can be cached in
                main memory, using cache devices allow much more of this
                working set to be served from low latency media. Using cache
                devices provides the greatest performance improvement for
                random read-workloads of mostly static content.

                To create a pool with cache devices, specify a "cache" vdev
                with any number of devices. For example:

                    # zpool create pool c0d0 c1d0 cache c2d0 c3d0

                The content of the cache devices is considered volatile, as
                is the case with other system caches.

      You can also use it as an intent log, which can dramatically improve write performance:

      Intent Log
                The ZFS Intent Log (ZIL) satisfies POSIX requirements for
                synchronous transactions. For instance, databases often
                require their transactions to be on stable storage devices
                when returning from a system call. NFS and other applica-
                tions can also use fsync() to ensure data stability. By
                default, the intent log is allocated from blocks within the
                main pool. However, it might be possible to get better per-
                formance using separate intent log devices such as NVRAM or
                a dedicated disk. For example:

                    # zpool create pool c0d0 c1d0 log c2d0

                Multiple log devices can also be specified, and they can be
                mirrored. See the EXAMPLES section for an example of mirror-
                ing multiple log devices.

                Log devices can be added, replaced, attached, detached, and
                imported and exported as part of the larger pool. Mirrored
                log devices can be removed by specifying the top-level mir-
                ror for the log.

    • Yes, Linux caches data from disks in RAM. But what we're talking about here is not caching in RAM, but using a fast disk (SSD) as cache for a slow disk.
      • What about this: the SSD Ram Disk (SSDRD). It's exactly like a normal RAM disk, but it simulates an SSD. It would be supremely faster to write to an imaginary SSD rather than an imaginary HD.


    • Re:I don't get it (Score:5, Informative)

      by TheRaven64 ( 641858 ) on Thursday April 22, 2010 @06:30PM (#31946988) Journal
      The submitter wants something like ZFS's L2ARC, which uses the flash as an intermediate cache between the RAM cache and the disk. This works very well [] for a lot of workloads. Since Linux users appear to be allowed to say 'switch to Linux' as an answer to questions about Windows, it only seems fair that 'switch to Solaris of FreeBSD' would be a valid solution to this problem.
      • by h4rr4r ( 612664 )

        Indeed, this is the 1 killer feature of ZFS that btrfs seems not to have yet.

  • by owlstead ( 636356 ) on Thursday April 22, 2010 @05:43PM (#31946264)

    Is there really a need for this? Intel 40 GB SSD still has a read speed of 170 MB/s and costs about 100 euro here in NL. Why have some kind of experimental configuration while prices are like that? OK, 35 MB/s write speed is not that high, but with the high IOPS and seek times you still have most of the benefits.

    I can see why you would want something like this, but I doubt the benefits are that large over a normal SSD + HDD configuration.

    • by Unit3 ( 10444 ) on Thursday April 22, 2010 @05:57PM (#31946502) Homepage

      They are huge for larger applications. Database servers, for instance, can see performance increases in the magnitude of 10-20x the number of transactions per second when using a scheme like this for datasets that are too large to fit in RAM.

      • Re: (Score:2, Insightful)

        by kgo ( 1741558 )

        Yeah, but if you've got some 'enterprise-level database' with those sort of transaction requirements, you can probably justify the purchase of SSDs. It's not exactly like you're building that system from craigslist parts...

        • by Amouth ( 879122 )

          Speak for your self.. some companies do not want to spend the money required to do it right.. but rather would have you spend more time than the equipment cost putting something crazy together to make it work.

    • by MobyDisk ( 75490 )

      t I doubt the benefits are that large over a normal SSD + HDD configuration.

      Which doesn't work for laptops. :-(

      Most laptops can only fit a single drive. I would love to have an SSD for faster build times, but a 40GB SSD is useless in my laptop since the second drive would have to be an external. But a 300GB drive with 16GB of integrated flash might give me a single drive with the performance boost that I am looking for.

    • by mickwd ( 196449 )
      I'd agree with this. Get that Intel SSD and stick /usr on it, together with any other read-mainly filesystems (maybe the root filesystem too, if you have stuff like /var on separate partitions).

      As well as faster reads, the biggest gains are in seek times, so it'd be helpful to have your home directory and all it's "dot" config files on there too (especially when starting up something like Gnome or KDE). However, if you're gonna fill your home directory with tons of stuff, then stick your home directory itse
  • ZFS (Score:5, Informative)

    by Anonymous Coward on Thursday April 22, 2010 @05:45PM (#31946288)
    ZFS can do this ( but I don't know about zfs-fuse
  • I hate to sound dumb, but isn't what you're describing basically file system buffering that OS's have been doing for many decades now?

    • Re: (Score:3, Informative)

      by MobyDisk ( 75490 )


      You would buffer on an SSD differently than your would do it in memory. Memory is volatile, so you write-back to disk as fast as possible. And whenever you cache something, you trade valuable physical memory for cache memory. With an SSD, you could cache 10 times as much data (Flash is much cheaper than DRAM), you would not have to write it back immediately (since it is not volatile), and the cache would survive a reboot so it could also speed the boot time.

    • by ras ( 84108 )

      Not really. There are a couple of applications for SSD's. One is to speed up boot times. Obviously a RAM cache is useless in that application.

      Another is if you want to speed up a transaction server (one that is writing as much as it is reading), then the answer is again no. Think of the battery backed up RAM cache RAID arrays have. Those caches are there for a reason. RAM can do read caching, but it can't do write caching and still be secure across power failure.

      My interest is in the second applicatio

    • Re:Buffers? (Score:4, Informative)

      by m.dillon ( 147925 ) on Thursday April 22, 2010 @09:06PM (#31949102) Homepage

      The single largest problem addressed by e.g. DragonFly's swapcache is meta-data caching to make scans and other operations on large filesystems with potentially millions or tens of millions of files a fast operation. Secondarily for something like DragonFly's HAMMER filesystem which can store a virtually unlimited number of live-accessable snapshots of the filesystem you can wind up with not just tens of millions of inodes, but hundreds of millions of inodes. Being able to efficiently operate on such large filesystems requires very low latency access to meta-data. Swapcache does a very good job providing the low latency necessary.

      System main memory just isn't big enough to cache all those inodes in a cost-effective manner. 14 million inodes takes around 6G of storage to cache. Well, you can do the math. Do you spend tens of thousands of dollars on a big whopping server with 60G of ram or do you spend a mere $200 on a 80G SSD?


  • ZFS L2ARC (Score:5, Informative)

    by jdong ( 1378773 ) on Thursday April 22, 2010 @05:46PM (#31946320)
    Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: []
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: []

      Swapcache on DragonFly BSD 2.6.x was implemented for this very reason IIRC.

  • by jameson ( 54982 )

    The OSDI deadline is in August; plenty of time to implement this, write it up, and get a publication at a top research conference out of it!

  • bcache (Score:5, Informative)

    by Wesley Felter ( 138342 ) <> on Thursday April 22, 2010 @05:52PM (#31946420) Homepage []

    I'm a little surprised at the lack of response on linux-kernel.

    Solaris and DragonFly have already implemented this feature; I'm surprised that Linux is so far behind.

    • Re:bcache (Score:5, Informative)

      by Kento ( 36001 ) <> on Thursday April 22, 2010 @06:02PM (#31946558)

      Hey, at least someone noticed :)

      That version was pretty raw. The current one is a lot farther along than that, but it's still got a ways to go - I'm hoping to have it ready for inclusion in a few months, if I can keep working on it full time. Anyone want to fund me? :D

      • If it were a block device wrapper along the lines of md, I'd be interested.

        Have a project at the moment where I'd *love* to be able to specify tiers of storage (say md volumes), and have writes go to the highest priority, and blocks trickle down to the lowest based on usage.

        Sort of like a specialized CoW.

      • by rayvd ( 155635 )

        Any plans to add in write cache support? I'm thinking along the lines of putting ZFS's ZIL on SSD's. Really makes NFS in sync mode much quicker.

      • Summer of Code, dude. This sounds like something Google would get behind.
    • by pydev ( 1683904 )

      You shouldn't be surprised that "Linux is so far behind"; we like it that way. If we thought that what the Solaris or DragonFly engineers are doing was important, we'd be using their systems instead.

    • I'm surprised that Linux is so far behind

      Obviously you are either unfamiliar with Linux, or unfamiliar with all non-Linux operating systems except perhaps Windows and maybe Darwin.

  • Waste of time (Score:5, Informative)

    by onefriedrice ( 1171917 ) on Thursday April 22, 2010 @05:54PM (#31946442)
    What a waste of time. Just put /home on a magnetic disk and everything else on the SSD. This way, you can get away with a small (very affordable) SSD for your binaries, libraries, config files, and app data, and use tried and true magnetic for your important files. Your own personal files don't need to be on a super fast disk anyway because they don't get as much access as you would think, but your binaries and config files get accessed a lot (unless you have a lot of RAM to cache that, which I also recommend). I've been doing this for over a year and enjoying 10 second boots, and instant program access coldstarts (including openoffice and firefox).

    I personally fit all my partitions except /home in only 12.7GB (the SSD is 30GB). Seriously, best upgrade ever. I will never put my root partition on a magnetic drive ever again.
    • by MobyDisk ( 75490 )

      Just put /home on a magnetic disk and everything else on the SSD

      Try jamming two hard drives into a laptop. :-(

      • by Logic ( 4864 )
        My primary Linux laptop is an Inspiron 1721, with two mirrored drives.
      • Actually, my S300 Thinkpad does have a second SATA connector - it's used by the rather useless DVD writer. I've looked everywhere but I cannot find anybody that sells a simple slim-DVD drive bracket. I cannot even find a cable that is suited. There is a site that build their own cable, but the cable that they build it out of is hard to get and I'm not that great at soldering electronics either.

        The problem is of course that there are a few different connectors out there (3 to 4 is my current estimation). Fur

      • Replace the optical drive. I've been keeping a log of how many times I've ever used my DVD drive while away from home. So far I'm at 1; I ripped a CD I got for Christmas before I brought it back home with me. It could have waited.

        Yes, I know it doesn't work for everyone, but I think it works for most people, assuming you get a USB powered optical drive or enclosure.

      • Try jamming two hard drives into a laptop.

        Re-read the problem as stated:

        "Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"

        Ya think maybe it's assumed that there are two drives? Just Maybe?

  • Hardware implemented caching is the way to go. There are 'hybrid drives' available now, which automatically cache disk access to SSD. These are very specific for the task and way more efficient than any software implementation.
    • I see that you haven't actually used hybrid hard drives, because they're nothing like what you describe. AFAIK only Samsung ever made them, and they've now been discontinued. The HHD itself didn't perform caching; it relied on Vista to manage the flash, which didn't really work out when no one bought Vista. HHDs also included laughably small (256MB) and slow flash that would get totally owned by the smallest slowest SSD today.

      • Mmmm, interesting. I guess I my understanding about how these 'hybrid drives' work is wrong. That makes me wonder why isnt a completely hardware implemented ssd cache available? Is there a technical reason why this is not possible? Wouldnt this be faster than a software implemented one?
        • Adaptec MaxIQ is a hardware SSD cache; just don't ask what it costs. To do SSD caching you need some DRAM to hold the metadata and a CPU to manage the cache; would you prefer to buy additional CPU/RAM or just use what's already in your computer?

          • Modern harddisks already have DRAM for buffering (also referred to as disk cache) and a dedicated "chip"/embedded processor exclusively for cache management should be cheap.
      • Somebody forgot that you cannot create speed by just using a slow flash chip or two. You need speedy chips, and a lot of them, to create a fast SSD. Besides a good controller and software for that controller of course. Besides that, 256 MB is so low that I wonder if RAM would not already perform most of the caching, even for writes.

  • I've actually been working on this off-and-on for a while, I'm hoping we can release some beta code soon. Currently developing it on Linux, but planning to release OSX and Windows versions, too. We're caching reads and writes, and only the blocks that are most frequently used, plus various other SSD-relevant optimisations. The block allocation logic is pretty complex (and I'm too busy with work), which is why it's been taking so long.

  • The oldest and simplest solution is to mount partitions from a small fast disk where you want fast read/write speeds, and partitions from slower disks everywhere else. Works quite well, too.

  • by guruevi ( 827432 ) <evi&smokingcube,be> on Thursday April 22, 2010 @06:12PM (#31946740) Homepage

    First of all, you can do this with ZFS which is newer tech and works quite well but is not (ever going to be) implemented in the Linux kernel

    For lower tech, you can do it the same way we used to do back when hard drives were small. In order to prevent people from filling up the whole hard drive we used to have partitions (now we just pop in more/larger drives in the array). /boot and /var would be in the first parts of the hard drive where the drive was fastest. /home could even be on another drive.

    You could do the same, put /boot and /usr on your SSD (or whatever you want to be fastest - if you have a X25-E or another fast writing SSD you could put /var on there (for log, tmp etc. if you have a server) or if you have shortage of RAM make it a swap drive. If you have small home folders, you could even put /home on there and leave your mp3's in /opt or so.

  • dm-cache (Score:3, Informative)

    by Gyver_lb ( 455546 ) on Thursday April 22, 2010 @06:15PM (#31946782)

    google dm-cache. Not updated since 2.6.29 though.

  • Windows 7 (and I think XP) has ReadyBoost. I haven't been able to find anything similar for Linux. It is also not clear how much difference ReadyBoost makes. The only benchmark I was able to find uses a crappy USB flash drive []. I was wondering how much difference something like the 80GB x-25m would make. There is clearly potential for huge gains as MaxIQ benchmarks show.

    This would be an awesome speedup if it was supported: just add a 40-80GB SSD for swap & file cache, and gain a massive performance boost

    • by MobyDisk ( 75490 )

      It is also not clear how much difference ReadyBoost makes

      Keep searching. There's lots of other benchmarks, and they all same the same thing. It helps if you have an old machine with insufficient memory.

      ReadyBoost doesn't really do what the author wants. Windows treats ReadyBoost as a write-through cache like it treats memory. It assumes you might unplug the drive at any moment. It won't speed up boot time, and it won't speed up writes. It won't place the swap file on there either. I'm not sure if you could tell Windows to use a regular SATA drive for Ready

      • Not quite correct. The ReadyBoost cache is pre-filled with usable data by the SuperFetch caching system, so it's not just write-back. It will also (theoretically) "learn" your loading patterns, and if you, say, start up the same application at 9am every day, it will start putting that application into the ReadyBoost cache just before. Also, the article linked in the GP specified a max cache of 4gb - that was true in Vista, but 7 and 2008 can use larger flash drives provided they're formatted NTFS or exFAT.
    • What you really want to be talking about is a different Windows 7 feature called ReadyDrive, which actually does what the author is talking about. Basically, the system heuristically determines what files are used most often/during boot and the BIOS read- and write-caches them to the flash in the ReadyDrive. I bought a Thinkpad with a "4GB Intel TurboBoost Memory" chip and it made a noticeable difference in boot time when I enabled it as a ReadyDrive.

      I also found that when Windows (rarely) crashed, it w

  • by Goaway ( 82658 ) on Thursday April 22, 2010 @06:25PM (#31946904) Homepage

    "If Linux doesn't already do it, you don't need it anyway!"

    • Re: (Score:2, Insightful)

      by EvanED ( 569694 )

      No kidding. It's threads like this (where I think the question is entirely reasonable and a good thing to support) that really sour my opinion of Linux. There are a few other things -- better file-system-supported metadata, transactional filesystems, etc. -- that have come up in the past too where it seems I just flat out disagree with most hardcore Linux users.

      (Don't worry, I hate Windows too, but for mostly different reasons. I don't use OS X very often and don't have an opinion on it, but I'd probably ha

  • Caching is only worthwhile if the data can benefit from higher bandwidth. I don't want, for example, my porn or SETI@home data using valuable cache space regardless of how frequently it's accessed, because it can't be processed at anything approaching the bandwidth of magnetic storage, let alone a good SSD. I'd much prefer to have my app/games stored on the SSD, because regardless of how infrequently I use any one of them, the performance gains would be far more dramatic.

  • by Jah-Wren Ryel ( 80510 ) on Thursday April 22, 2010 @08:02PM (#31948370)

    I have a similar problem and I tried the FSCache approach:

    I've got two raids.
    One is optimized for big ass files read contiguously and has raid6 redundancy.
    The other is a much smaller JBOD that I can reconfigure via mdraid to anything that linux supports in software.

    The problem is that 5% of the big ass files need read-only random access and that kills throughput for anything else going on. It takes me down from ~400MB/s to 15MB/s.

    So, I thought I'd use the FSCache approach and use the JBOD as the cache.
    I did an NFS mount over loopback and pointed the fscache to the JBOD.
    It worked great got practically full throughput for contiguous access, for about 10 hours and then crashed the system.

    Apparently NFS over loopback is well known to be broken in linux and has been since, essentially, forever.
    I was stunned, it had never even occurred to me that NFS over loopback would be broken. Its freaking 2010 - that something I had been using on Sun0S 3 a bazillion years ago didn't work on linux today had not even entered my mind.

    I've also tried replicating the files from the raid6 to the jbod, but that quickly turned into a hassle keeping everything syncronized between the files on disk and the applications that create the files on the raid6 and the apps that use the files on the JBOD. Plus, it doesn't scale out past the size of the JBOD, which I also ran into.

    So now, I'm looking at putting the apps that need random access reads to the data in a VM and NFS mounting it with cache to the VM hoping to avoid the NFS-broken-over-loopback problem. I haven't had time to implement it yet, and personally and leery of doing so since I have to wonder what new "known-broken" problems will bite me in the ass.

    So, if there is a better way, I am dying to hear it, unfortunately solaris/freebsd is not an option...

I never cheated an honest man, only rascals. They wanted something for nothing. I gave them nothing for something. -- Joseph "Yellow Kid" Weil