Software SSD Cache Implementation For Linux?

Follow Slashdot blog updates by subscribing to our blog RSS feed

Software SSD Cache Implementation For Linux? 297

Posted by timothy on Thursday April 22, 2010 @05:37PM from the only-obscure-if-you-think-it-is dept.

Annirak writes "With the bottom dropping out of the magnetic disk market and SSD prices still over $3/GB, I want to know if there is a way to to get the best of both worlds. Ideally, a caching algorithm would store frequently used sectors, or sectors used during boot or application launches (hot sectors), to the SSD. Adaptec has a firmware implementation of this concept, called MaxIQ, but this is only for use on their RAID controllers and only works with their special, even more expensive, SSD. Silverstone recently released a device which does this for a single disk, but it is limited: it caches the first part of the magnetic disk, up to the size of the SSD, rather than caching frequently used sectors. The FS-Cache implementation in recent Linux kernels seems to be primarily intended for use in NFS and AFS, without much provision for speeding up local filesystems. Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"

This discussion has been archived. No new comments can be posted.

Software SSD Cache Implementation For Linux?

Load All Comments

Search 297 Comments Log In/Create an Account

Comments Filter:

I don't get it (Score:2)

by microbee ( 682094 ) writes:

Linux caches data from any disks all the same, SSD or not.
- Re: (Score:2)
  
  by Wesley Felter ( 138342 ) writes:
  
  Linux caches disk data in memory. The author wants to cache disk data in an SSD.
  - Re: (Score:2)
    
    by Korin43 ( 881732 ) * writes:
    
    Would using an SSD as a swap device have the effect they want?
    - Re: (Score:2, Informative)
      
      by Unit3 ( 10444 ) writes:
      
      No. Swap is not a cache. Swap holds things that don't fit in RAM. I/O cache will never hit swap, it limits itself to physical RAM.
      - Re:Wrong. Swap often acts as a cache. (Score:5, Informative)
        
        by m.dillon ( 147925 ) writes: on Thursday April 22, 2010 @09:20PM (#31949272) Homepage
        
        The way DragonFly's swapcache works is that VM pages (cached in ram) go from the active queue to the inactive queue to the cache (almost free) queue to the free queue. VM pages sitting in the inactive queue are subject to being written out to the swapcache. VM pages in the active queue (or cache or free queues) are not considered.
        In otherwords, simply accessing cacheable data or meta-data from the hard drive does not itself trigger writing to the SSD swapcache. It's only when the cached VM pages are pushed out of the active queue due to memory pressure and are clearly heading out the door when DragonFly decides to write them to the SSD.
        This prevents SSD write activity from interfering with the operation of the production system and also tends to do a good job selecting what data to write to the SSD when and what data not to. A file which is in constant use by the system just stays in ram, there's no point writing it out to the SSD.
        With respect to deciding what data to cache and what data not to, with meta-data its simple. You cache as much meta-data as you can because every piece of meta-data gives you a multiplicative performance improvement. With file data it is harder since you don't want to try to cycle e.g. a terrabyte of data through a 40G swapcache. The production system's working data set at any given moment needs to either fit in the swapcache or you need to carefully select which directory topologies you want to cache.
        -Matt
        
        Parent Share
        twitter facebook
        
        Re:Wrong. Swap often acts as a cache. (Score:4, Informative)
        
        by m.dillon ( 147925 ) writes: on Thursday April 22, 2010 @11:28PM (#31950476) Homepage
        
        OS's have traditionally discarded clean cache data when memory pressure forces the pages out. Swap traditionally applied only to dirty anonymous memory (The OS needs to write dirty data somewhere, after all, and if it isn't backed by a file then that is what swap is for).
        However in the last decade traditional paging to swap has fallen by the wayside as memory capacities have increased. Most of the data in ram on systems today is clean data, not dirty data, and most of the dirty data is backed by a file (e.g. write()s to a database or something like that). On most systems today if you look at swap space use you find it near zero.
        But the concept of swap can trivially be expanded to cover more areas of interest. tmpfs (tmpfs, md, mfs, etc) is a good example. For that matter anonymous memory for VMs can be backed by swap. It is very desireable to back the memory for a VM with either a tmpfs-based file or just straight anonymous memory instead of a file in a normal filesystem. That is a good use for swap too.
        It isn't that big a leap to expand swap coverage to also cache clean data. It took about two weeks to implement the basics on DragonFly. Those operating systems which don't have this capability will probably get it as time goes on simply because it is an extremely useful mechanic for interfacing a SSD-based cache into a system. It is also probably the cleanest and simplest way to implement this sort of cache, and it pairs up well with the strengths of the SSD storage mechanic. Since you can reallocate swap space when something is rewritten there are virtually no write amplification effects and the storage on the SSD is cycled very nicely. You get much better wear leveling than you would if you tried to map a normal filesystem (or mirror the blocks associated with a normal filesystem) on top of the SSD.
        -Matt
        
        Parent Share
        twitter facebook
        
        Re:Wrong. Swap often acts as a cache. (Score:5, Informative)
        
        by Score Whore ( 32328 ) writes: on Friday April 23, 2010 @12:34AM (#31951018)
        
        Solaris certainly doesn't. What developer would ever code this kind of behavior? Non-dirty filesystem data in the cache is already on disk, what would be the rational to write it out to another part of the disk? That's just stupid. Non-dirty pages are thrown away when RAM is in demand. Dirty filesystem data is just written to disk. Then the pages become non-dirty and can be freed at any time. Possibly immediately if there is demand.
        Scenario A:
        1. File is read and data is copied into system memory where is it buffered. Time passes.
        2. Memory usage skyrockets.
        3. Kernel writes data to swap space and frees the memory for use by other processes.
        4. Later an application wants that data. Kernel reads data from swap space.
        Scenario B:
        1. File is read and data is copied into system memory where is it buffered. Time passes.
        2. Memory usage skyrockets.
        3. Kernel locates non-dirty cached data and frees that page for use by other processes.
        4. Later an application wants that data. Kernel reads data from original file on disk.
        Differences between scenario A & B:
        Scenario A has two disk IOs (steps 3&4) during memory pressure. Scenario B has one (step 4).
        Scenario A uses limited swap space to store duplicate data. Scenario B doesn't.
        And no, Solaris doesn't cache slow devices (tape, dvd-rom, etc.) either. If you choose to access those types of devices, that is your choice. The OS isn't going to save your ass. If you want it cached, make your application do the caching.
        Also, I'm not considering special purpose systems such as ZFS's l2arc or other similar/more generalized systems that utilize SSD as a midway point between RAM and HDD. We're talking generic swap space and filesystem caches.
        
        Parent Share
        twitter facebook
    - Re: (Score:2, Offtopic)
      
      by Penguinisto ( 415985 ) writes:
      
      ...only if you want to blow out the SSD wear-limits.
      What the author wants (I believe) is to have Linux figure down which sectors are read most frequently, and have those mapped/linked/whatever to the SSD for speed reasons.
      If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.
      - Re: (Score:2)
        
        by pak9rabid ( 1011935 ) writes:
        
        Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?
        
        Re: (Score:2)
        
        by the_one(2) ( 1117139 ) writes:
        
        Probably not. That would be bad if for example you wanted to burn a DVD and the burner program put a lot of stuff in /tmp. I'm not a linux pro or anything so I don't know how different distros do it but I don't think that's the default.
        
        Re: (Score:2)
        
        by raynet ( 51803 ) writes:
        
        Well, if you use tmpfs and not ramdisk for /tmp, then pages will be swapped to disk if needed, thus you can burn you DVD as long as you have enough swap available and damons like swapd or swapspace allow you to have reasonable size swap partition and then will create swapfiles by demand.
        
        Re: (Score:2)
        
        by rwa2 ( 4391 ) * writes:
        
        Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?
        Depending on the distribution, but sometimes.
        On servers /tmp can get pretty big with random crap, though, so generally you want to be able to put it on a disk or allow it to swap out and use your RAM for something more useful.
        But on thin clients, netbooks, etc. without too much going on it might be better to put it on tmpfs to reduce SSD wear.
      - Re: (Score:2)
        
        by dissy ( 172727 ) writes:
        
        If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.
        Unfortunately, it usually works out that some of the most volatile places on disk (/home /tmp swap) are the very places one would see the best result in speeding up.
        Also unfortunately those are the worst uses currently for a SSD
        Then again, for anyone who really wants to speed up things like swap and /tmp, the best way is to simply quadruple your ram and get rid of swap, and use tmpfs in ram for /tmp.
        The usual reason for not doing that is ram is expensive, and on top of that motherboards to handle a ton of r
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
      - Re: (Score:2)
        
        by countach ( 534280 ) writes:
        
        Because all of /bin is hardly going to be your most used stuff, and there's probably a ton of stuff frequently used that isn't in /bin, /usr.
        Sure, you can try and mount your most used stuff on SSD, but that's (a) a pain in the neck to fiddle around with (b) something ideally better left to an algorithm. (c) doesn't actually work that well, since you have to divide all your most used stuff into separate file systems.
      - Re: (Score:2)
        
        by Plekto ( 1018050 ) writes:
        
        The best solution then is a physical ramdisk/ramdrive. The capacity isn't huge(8-16GB as a rule), but the speed is easily equal to any SSD and you can beat on it forever without worrying about it running out of write cycles.
        
        Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        Just make a tmpfs, ram is cheap as hell.
    - Re: (Score:2, Informative)
      
      by Jezza ( 39441 ) writes:
      
      Assuming the SSD was faster at both read and write - it should speed things up. Hell just moving the swap onto a different physical disk helps. But don't. SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail. If you think about what's happening when the system is swapping - that's exactly what's going on. So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM,
      - Re: (Score:2, Informative)
        
        by Anonymous Coward writes:
        
        SSD wear with writing, so if you constantly write to the same "sectors" they will fail.
        2006 called, they want their FUD back. While it's true that erase blocks in flash memory wear out with use, the whole battle between SSD manufacturers for the last couple years has been in mapping algorithms that ensure you don't hit the same erase block very often. By now, SSDs have longer lifetimes than HDDs. Of course that applies to real SSDs, not makeshift IDE-to-CompactFlash adapters.
      - Re: (Score:2, Informative)
        
        by EvanED ( 569694 ) writes:
        
        So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM, so the system won't need to swap (with enough RAM you don't need swap at all).
        A RAM buffer cache and SSD cache address far different issues. The buffer cache is far faster when it hits, but the SSD cache is far larger. It's pretty easy to find workloads where getting enough RAM so that your working set will fit into your buffer cache (alongside the memory use of whatever you're doing) would be more ex
        
        Re: (Score:2)
        
        by Jezza ( 39441 ) writes:
        
        Sure, adding RAM isn't a panacea, but running "san swap" can really speed (some) things greatly. The question I was addressing was all about swap - nothing else.
        I do think running swap in SSD is **probably** a bad idea, especially if you can put enough RAM in to not need swap. But sure that is a pretty glib statement...
      - Re: (Score:2)
        
        by gfody ( 514448 ) writes:
        
        Why are people hung up on SSD lifespan? Unless you're talking about USB flash thumbdrives any SSD you buy is not going to "wear out" any time soon - 5 years is the absolute worst case scenario assuming you write constantly as much as the drive will take [anandtech.com]. Tracking against intel's media wearout indicator suggests even heavily used drives will last around 15 years. How much computer equipment do you have around that's even 5 years old?
        
        Re: (Score:2)
        
        by Jezza ( 39441 ) writes:
        
        I'm not going to answer that - but to give you an idea I do have a NeXT Dimension ...
        But I take your point.
      - Re: (Score:2)
        
        by AHuxley ( 892839 ) writes:
        
        "SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail."
        The early versions did, but now you have real developers with real support entering consumer space.
        eg. http://eshop.macsales.com/shop/internal_storage/Mercury_Extreme_SSD_Sandforce [macsales.com]
        10,000,000 Mean Time Before Failure (MTBF) and a 5 year warranty.
- Re: (Score:2)
  
  by owlstead ( 636356 ) writes:
  
  This is about doing double caching: cache to fast but limited RAM (L1) first and then have a much larger but slower cache, that being the SSD (L2). Difference being with other caching systems that the SSD of course holds state if power is down (so often use sectors may never be written do disk).
  - Re:I don't get it (Score:5, Insightful)
    
    by Colin Smith ( 2679 ) writes: on Thursday April 22, 2010 @05:58PM (#31946506)
    
    so
    CPU L1
    CPU L2
    CPU L3
    RAM
    SSD
    DISK
    NETWORK
    Internet
    I estimate SSDs would be closer to Level 5 cache.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Znork ( 31774 ) writes:
      
      I'd argue it's better to implement as HSM (Hierarchial Storage Management), with least recently used things getting delegated to more archival storage. It would be nice with a device-mapper-hsm layer that would let you simply stack one device upon the other and obtain the best distribution of desireable characteristics they could offer.
      IIRC, there was an intern at IBM who did a project like that some years ago, but I don't think much became of it.
    - Re: (Score:2)
      
      by MobyDisk ( 75490 ) writes:
      
      <facepalm
      If you want to play Mr. Pedantic, you skipped registers. And CPU cache may not necessarily cache disk data, so those don't count for the same reason registers don't count. Networks don't cache the internet. And don't forget newspapers - the Internet caches those. And newspapers cache events, which cache time. :-)
      The point is that most of the layers you listed are implementation dependent or not relevant to the discussion. For this purpose, the CPU is a black box - it could have different
    - Re: (Score:2)
      
      by owlstead ( 636356 ) writes:
      
      OK, but CPU L1/2/3 is a data cache. Of course it will help but it's just not configured as a disk cache. Multi-processor systems for instance would not benefit from the CPU caches.
      Furthermore, I don't know about you, but my disk is certainly not used for network or internet cache. For network resources there is no such thing as configuring life-time and the browser disk cache is the first thing I disable (using a spinning system disk for internet cache is stupidity IMHO).
      Besides all that, it's just an examp
    - - Re:I don't get it (Score:4, Interesting)
        
        by owlstead ( 636356 ) writes: on Thursday April 22, 2010 @07:26PM (#31947880)
        
        A fast SSD is not 1 Gb/s under ideal conditions. A fast SSD is up to 2 Gb/s (about 250 MB/s) under real life conditions (while reading). Anyway, it still makes sense to cache network content to disk if the other side of the connection is slow or not reliable.
        
        Parent Share
        twitter facebook
  - Re: (Score:2)
    
    by Jezza ( 39441 ) writes:
    
    Yeah, especially if those sectors don't change much - an SSD isn't suitable for data that's rapidly changing.
- Re:I don't get it (Score:5, Informative)
  
  by Anonymous Coward writes: on Thursday April 22, 2010 @05:55PM (#31946466)
  
  The idea is to use the SSD as a second-level disk cache. So instead of simply discarding cached data under memory pressure, it's written to the SSD. It's still way slower than RAM, but it's got much better random-access performance characteristics than spinning rust and it's large compared to RAM.
  As for how to do it in Linux, I'm not aware of a way. If you are open to the possibility of using other operating systems, this functionality is part of OpenSolaris (google for "zfs l2arc" for more information).
  Cache Devices
  Devices can be added to a storage pool as "cache devices."
  These devices provide an additional layer of caching between
  main memory and disk. For read-heavy workloads, where the
  working set size is much larger than what can be cached in
  main memory, using cache devices allow much more of this
  working set to be served from low latency media. Using cache
  devices provides the greatest performance improvement for
  random read-workloads of mostly static content.
  To create a pool with cache devices, specify a "cache" vdev
  with any number of devices. For example:
  # zpool create pool c0d0 c1d0 cache c2d0 c3d0
  The content of the cache devices is considered volatile, as
  is the case with other system caches.
  You can also use it as an intent log, which can dramatically improve write performance:
  Intent Log
  The ZFS Intent Log (ZIL) satisfies POSIX requirements for
  synchronous transactions. For instance, databases often
  require their transactions to be on stable storage devices
  when returning from a system call. NFS and other applica-
  tions can also use fsync() to ensure data stability. By
  default, the intent log is allocated from blocks within the
  main pool. However, it might be possible to get better per-
  formance using separate intent log devices such as NVRAM or
  a dedicated disk. For example:
  # zpool create pool c0d0 c1d0 log c2d0
  Multiple log devices can also be specified, and they can be
  mirrored. See the EXAMPLES section for an example of mirror-
  ing multiple log devices.
  Log devices can be added, replaced, attached, detached, and
  imported and exported as part of the larger pool. Mirrored
  log devices can be removed by specifying the top-level mir-
  ror for the log.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by jibjibjib ( 889679 ) writes:
  
  Yes, Linux caches data from disks in RAM. But what we're talking about here is not caching in RAM, but using a fast disk (SSD) as cache for a slow disk.
  - Re: (Score:2)
    
    by InlawBiker ( 1124825 ) writes:
    
    What about this: the SSD Ram Disk (SSDRD). It's exactly like a normal RAM disk, but it simulates an SSD. It would be supremely faster to write to an imaginary SSD rather than an imaginary HD.
    Patent!
- Re:I don't get it (Score:5, Informative)
  
  by TheRaven64 ( 641858 ) writes: on Thursday April 22, 2010 @06:30PM (#31946988) Journal
  
  The submitter wants something like ZFS's L2ARC, which uses the flash as an intermediate cache between the RAM cache and the disk. This works very well [sun.com] for a lot of workloads. Since Linux users appear to be allowed to say 'switch to Linux' as an answer to questions about Windows, it only seems fair that 'switch to Solaris of FreeBSD' would be a valid solution to this problem.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by h4rr4r ( 612664 ) writes:
    
    Indeed, this is the 1 killer feature of ZFS that btrfs seems not to have yet.
  - - Re: (Score:3, Informative)
      
      by Just Some Guy ( 3352 ) writes:
      
      Trick with FBSD - it doesn't believe in removing L2ARC devices yet.
      You're wrong:
      $ sudo zpool add tank cache ada1 $ sudo zpool status [...] cache ada1 ONLINE 0 0 0 $ sudo zpool remove tank ada1 $ sudo zpool status [nothing about a cache device]
      You're probably thinking of ZIL devices. You can't remove them in FreeBSD, but the version of ZFS in Solaris (that's being ported to FreeBSD right now) supports removing them.
- - Re: (Score:2)
    
    by rwa2 ( 4391 ) * writes:
    
    Submitter's probably looking at this backwards; just put the entire system on the SSD, and create symlinks to large directories hosted on conventional storage instead.
    Even very small 32GB SSDs are large enough to fit your entire OS on; then you can use the hard disk for large file storage. So I'd say it's probably not worth the effort to try to collect detailed traces using SystemTap or whatever to figure out which files should go on SSD and which should be relegated to the spindle drive; just put it al
- - - Re: (Score:2)
      
      by h4rr4r ( 612664 ) writes:
      
      RAM is cheap, the OS already caches disk to that. Adding ram is more useful and easier.
isn't 40 GB enough for applications? (Score:4, Interesting)

by owlstead ( 636356 ) writes: on Thursday April 22, 2010 @05:43PM (#31946264)

Is there really a need for this? Intel 40 GB SSD still has a read speed of 170 MB/s and costs about 100 euro here in NL. Why have some kind of experimental configuration while prices are like that? OK, 35 MB/s write speed is not that high, but with the high IOPS and seek times you still have most of the benefits.
I can see why you would want something like this, but I doubt the benefits are that large over a normal SSD + HDD configuration.

Share
twitter facebook
- Re:isn't 40 GB enough for applications? (Score:5, Informative)
  
  by Unit3 ( 10444 ) writes: on Thursday April 22, 2010 @05:57PM (#31946502) Homepage
  
  They are huge for larger applications. Database servers, for instance, can see performance increases in the magnitude of 10-20x the number of transactions per second when using a scheme like this for datasets that are too large to fit in RAM.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by kgo ( 1741558 ) writes:
    
    Yeah, but if you've got some 'enterprise-level database' with those sort of transaction requirements, you can probably justify the purchase of SSDs. It's not exactly like you're building that system from craigslist parts...
    - Re: (Score:2)
      
      by Amouth ( 879122 ) writes:
      
      Speak for your self.. some companies do not want to spend the money required to do it right.. but rather would have you spend more time than the equipment cost putting something crazy together to make it work.
      - Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        Then just use ZFS on solaris or bsd.
        You can add the SSDs as cache devices.
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  t I doubt the benefits are that large over a normal SSD + HDD configuration.
  Which doesn't work for laptops. :-(
  Most laptops can only fit a single drive. I would love to have an SSD for faster build times, but a 40GB SSD is useless in my laptop since the second drive would have to be an external. But a 300GB drive with 16GB of integrated flash might give me a single drive with the performance boost that I am looking for.
- Re: (Score:2)
  
  by mickwd ( 196449 ) writes:
  
  I'd agree with this. Get that Intel SSD and stick /usr on it, together with any other read-mainly filesystems (maybe the root filesystem too, if you have stuff like /var on separate partitions). As well as faster reads, the biggest gains are in seek times, so it'd be helpful to have your home directory and all it's "dot" config files on there too (especially when starting up something like Gnome or KDE). However, if you're gonna fill your home directory with tons of stuff, then stick your home directory itse
ZFS (Score:5, Informative)

by Anonymous Coward writes: on Thursday April 22, 2010 @05:45PM (#31946288)

ZFS can do this (http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Cache_Devices) but I don't know about zfs-fuse

Share
twitter facebook
Buffers? (Score:2)

by DoofusOfDeath ( 636671 ) writes:

I hate to sound dumb, but isn't what you're describing basically file system buffering that OS's have been doing for many decades now?
- Re: (Score:3, Informative)
  
  by MobyDisk ( 75490 ) writes:
  
  No.
  You would buffer on an SSD differently than your would do it in memory. Memory is volatile, so you write-back to disk as fast as possible. And whenever you cache something, you trade valuable physical memory for cache memory. With an SSD, you could cache 10 times as much data (Flash is much cheaper than DRAM), you would not have to write it back immediately (since it is not volatile), and the cache would survive a reboot so it could also speed the boot time.
- Re: (Score:2)
  
  by ras ( 84108 ) writes:
  
  Not really. There are a couple of applications for SSD's. One is to speed up boot times. Obviously a RAM cache is useless in that application.
  Another is if you want to speed up a transaction server (one that is writing as much as it is reading), then the answer is again no. Think of the battery backed up RAM cache RAID arrays have. Those caches are there for a reason. RAM can do read caching, but it can't do write caching and still be secure across power failure.
  My interest is in the second applicatio
- Re:Buffers? (Score:4, Informative)
  
  by m.dillon ( 147925 ) writes: on Thursday April 22, 2010 @09:06PM (#31949102) Homepage
  
  The single largest problem addressed by e.g. DragonFly's swapcache is meta-data caching to make scans and other operations on large filesystems with potentially millions or tens of millions of files a fast operation. Secondarily for something like DragonFly's HAMMER filesystem which can store a virtually unlimited number of live-accessable snapshots of the filesystem you can wind up with not just tens of millions of inodes, but hundreds of millions of inodes. Being able to efficiently operate on such large filesystems requires very low latency access to meta-data. Swapcache does a very good job providing the low latency necessary.
  System main memory just isn't big enough to cache all those inodes in a cost-effective manner. 14 million inodes takes around 6G of storage to cache. Well, you can do the math. Do you spend tens of thousands of dollars on a big whopping server with 60G of ram or do you spend a mere $200 on a 80G SSD?
  -Matt
  
  Parent Share
  twitter facebook
ZFS L2ARC (Score:5, Informative)

by jdong ( 1378773 ) writes: on Thursday April 22, 2010 @05:46PM (#31946320)

Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test [sun.com]

Share
twitter facebook
- Re: (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test [sun.com]
  
  Swapcache on DragonFly BSD 2.6.x was implemented for this very reason IIRC.
  http://leaf.dragonflybsd.org/cgi/web-man?command=swapcache&section=ANY
OSDI (Score:2)

by jameson ( 54982 ) writes:

The OSDI deadline is in August; plenty of time to implement this, write it up, and get a publication at a top research conference out of it!
bcache (Score:5, Informative)

by Wesley Felter ( 138342 ) writes: <wesley@felter.org> on Thursday April 22, 2010 @05:52PM (#31946420) Homepage

http://lkml.org/lkml/2010/4/5/41 [lkml.org]
I'm a little surprised at the lack of response on linux-kernel.
Solaris and DragonFly have already implemented this feature; I'm surprised that Linux is so far behind.

Share
twitter facebook
- Re:bcache (Score:5, Informative)
  
  by Kento ( 36001 ) writes: <kent.overstreet@gmail.com> on Thursday April 22, 2010 @06:02PM (#31946558)
  
  Hey, at least someone noticed :)
  That version was pretty raw. The current one is a lot farther along than that, but it's still got a ways to go - I'm hoping to have it ready for inclusion in a few months, if I can keep working on it full time. Anyone want to fund me? :D
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by CyprusBlue113 ( 1294000 ) writes:
    
    If it were a block device wrapper along the lines of md, I'd be interested.
    Have a project at the moment where I'd *love* to be able to specify tiers of storage (say md volumes), and have writes go to the highest priority, and blocks trickle down to the lowest based on usage.
    Sort of like a specialized CoW.
    - - Re: (Score:2)
        
        by CyprusBlue113 ( 1294000 ) writes:
        
        I'm not looking for a limited use pool of cache only disks so much as hierarchical storage management at the block level.
        For example you could have a set of 7.2k sata drives, a set of 15k SAS drives, a set of FC drives, and a set of SSDs. Each group would be a tier, which all together would act as a storage pool (each tier probably raided together using MD, as there is no hybrid md/lvm at the moment to sub-devide them gracefully at the lv level instead of pv level), with writes all going to the highest spee
  - Re: (Score:2)
    
    by rayvd ( 155635 ) writes:
    
    Any plans to add in write cache support? I'm thinking along the lines of putting ZFS's ZIL on SSD's. Really makes NFS in sync mode much quicker.
  - Re: (Score:2)
    
    by GuruBuckaroo ( 833982 ) writes:
    
    Summer of Code, dude. This sounds like something Google would get behind.
- Re: (Score:2)
  
  by pydev ( 1683904 ) writes:
  
  You shouldn't be surprised that "Linux is so far behind"; we like it that way. If we thought that what the Solaris or DragonFly engineers are doing was important, we'd be using their systems instead.
- Re: (Score:2)
  
  by TheRaven64 ( 641858 ) writes:
  
  I'm surprised that Linux is so far behind
  Obviously you are either unfamiliar with Linux, or unfamiliar with all non-Linux operating systems except perhaps Windows and maybe Darwin.
Waste of time (Score:5, Informative)

by onefriedrice ( 1171917 ) writes: on Thursday April 22, 2010 @05:54PM (#31946442)

What a waste of time. Just put /home on a magnetic disk and everything else on the SSD. This way, you can get away with a small (very affordable) SSD for your binaries, libraries, config files, and app data, and use tried and true magnetic for your important files. Your own personal files don't need to be on a super fast disk anyway because they don't get as much access as you would think, but your binaries and config files get accessed a lot (unless you have a lot of RAM to cache that, which I also recommend). I've been doing this for over a year and enjoying 10 second boots, and instant program access coldstarts (including openoffice and firefox).

I personally fit all my partitions except /home in only 12.7GB (the SSD is 30GB). Seriously, best upgrade ever. I will never put my root partition on a magnetic drive ever again.

Share
twitter facebook
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  Just put /home on a magnetic disk and everything else on the SSD
  Try jamming two hard drives into a laptop. :-(
  - Re: (Score:2)
    
    by Logic ( 4864 ) writes:
    
    My primary Linux laptop is an Inspiron 1721, with two mirrored drives.
    - - Re: (Score:2)
        
        by MobyDisk ( 75490 ) writes:
        
        How much does it weight? I almost bought an Alienware a few years ago, but they didn't list weight in the specs.
  - Re: (Score:2)
    
    by owlstead ( 636356 ) writes:
    
    Actually, my S300 Thinkpad does have a second SATA connector - it's used by the rather useless DVD writer. I've looked everywhere but I cannot find anybody that sells a simple slim-DVD drive bracket. I cannot even find a cable that is suited. There is a site that build their own cable, but the cable that they build it out of is hard to get and I'm not that great at soldering electronics either.
    The problem is of course that there are a few different connectors out there (3 to 4 is my current estimation). Fur
  - Replace the optical drive. (Score:2)
    
    by kf6auf ( 719514 ) writes:
    
    Replace the optical drive. I've been keeping a log of how many times I've ever used my DVD drive while away from home. So far I'm at 1; I ripped a CD I got for Christmas before I brought it back home with me. It could have waited.
    Yes, I know it doesn't work for everyone, but I think it works for most people, assuming you get a USB powered optical drive or enclosure.
  - Duh (Score:2)
    
    by FranTaylor ( 164577 ) writes:
    
    Try jamming two hard drives into a laptop.
    Re-read the problem as stated:
    "Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"
    Ya think maybe it's assumed that there are two drives? Just Maybe?
Go for Hardware implemented Caching (Score:2, Informative)

by iammani ( 1392285 ) writes:

Hardware implemented caching is the way to go. There are 'hybrid drives' available now, which automatically cache disk access to SSD. These are very specific for the task and way more efficient than any software implementation.
- Re: (Score:2)
  
  by Wesley Felter ( 138342 ) writes:
  
  I see that you haven't actually used hybrid hard drives, because they're nothing like what you describe. AFAIK only Samsung ever made them, and they've now been discontinued. The HHD itself didn't perform caching; it relied on Vista to manage the flash, which didn't really work out when no one bought Vista. HHDs also included laughably small (256MB) and slow flash that would get totally owned by the smallest slowest SSD today.
  - Re: (Score:2)
    
    by iammani ( 1392285 ) writes:
    
    Mmmm, interesting. I guess I my understanding about how these 'hybrid drives' work is wrong. That makes me wonder why isnt a completely hardware implemented ssd cache available? Is there a technical reason why this is not possible? Wouldnt this be faster than a software implemented one?
    - Re: (Score:2)
      
      by Wesley Felter ( 138342 ) writes:
      
      Adaptec MaxIQ is a hardware SSD cache; just don't ask what it costs. To do SSD caching you need some DRAM to hold the metadata and a CPU to manage the cache; would you prefer to buy additional CPU/RAM or just use what's already in your computer?
      - Re: (Score:2)
        
        by iammani ( 1392285 ) writes:
        
        Modern harddisks already have DRAM for buffering (also referred to as disk cache) and a dedicated "chip"/embedded processor exclusively for cache management should be cheap.
  - Re: (Score:2)
    
    by owlstead ( 636356 ) writes:
    
    Somebody forgot that you cannot create speed by just using a slow flash chip or two. You need speedy chips, and a lot of them, to create a fast SSD. Besides a good controller and software for that controller of course. Besides that, 256 MB is so low that I wonder if RAM would not already perform most of the caching, even for writes.
Working on this (Score:2)

by pmjordan ( 745016 ) writes:

I've actually been working on this off-and-on for a while, I'm hoping we can release some beta code soon. Currently developing it on Linux, but planning to release OSX and Windows versions, too. We're caching reads and writes, and only the blocks that are most frequently used, plus various other SSD-relevant optimisations. The block allocation logic is pretty complex (and I'm too busy with work), which is why it's been taking so long.
But why? (Score:2)

by Eudial ( 590661 ) writes:

The oldest and simplest solution is to mount partitions from a small fast disk where you want fast read/write speeds, and partitions from slower disks everywhere else. Works quite well, too.
People forgot the low-level Linux stuff quickly. (Score:3, Informative)

by guruevi ( 827432 ) writes: on Thursday April 22, 2010 @06:12PM (#31946740)

First of all, you can do this with ZFS which is newer tech and works quite well but is not (ever going to be) implemented in the Linux kernel
For lower tech, you can do it the same way we used to do back when hard drives were small. In order to prevent people from filling up the whole hard drive we used to have partitions (now we just pop in more/larger drives in the array). /boot and /var would be in the first parts of the hard drive where the drive was fastest. /home could even be on another drive.
You could do the same, put /boot and /usr on your SSD (or whatever you want to be fastest - if you have a X25-E or another fast writing SSD you could put /var on there (for log, tmp etc. if you have a server) or if you have shortage of RAM make it a swap drive. If you have small home folders, you could even put /home on there and leave your mp3's in /opt or so.

Share
twitter facebook
dm-cache (Score:3, Informative)

by Gyver_lb ( 455546 ) writes: on Thursday April 22, 2010 @06:15PM (#31946782)

google dm-cache. Not updated since 2.6.29 though.

Share
twitter facebook
I was just wondering the same thing (Score:2)

by RelliK ( 4466 ) writes:

Windows 7 (and I think XP) has ReadyBoost. I haven't been able to find anything similar for Linux. It is also not clear how much difference ReadyBoost makes. The only benchmark I was able to find uses a crappy USB flash drive [pcstats.com]. I was wondering how much difference something like the 80GB x-25m would make. There is clearly potential for huge gains as MaxIQ benchmarks show.
This would be an awesome speedup if it was supported: just add a 40-80GB SSD for swap & file cache, and gain a massive performance boost
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  It is also not clear how much difference ReadyBoost makes
  Keep searching. There's lots of other benchmarks, and they all same the same thing. It helps if you have an old machine with insufficient memory.
  ReadyBoost doesn't really do what the author wants. Windows treats ReadyBoost as a write-through cache like it treats memory. It assumes you might unplug the drive at any moment. It won't speed up boot time, and it won't speed up writes. It won't place the swap file on there either. I'm not sure if you could tell Windows to use a regular SATA drive for Ready
  - Re: (Score:2)
    
    by GuruBuckaroo ( 833982 ) writes:
    
    Not quite correct. The ReadyBoost cache is pre-filled with usable data by the SuperFetch caching system, so it's not just write-back. It will also (theoretically) "learn" your loading patterns, and if you, say, start up the same application at 9am every day, it will start putting that application into the ReadyBoost cache just before. Also, the article linked in the GP specified a max cache of 4gb - that was true in Vista, but 7 and 2008 can use larger flash drives provided they're formatted NTFS or exFAT.
- Re: (Score:2)
  
  by robot256 ( 1635039 ) writes:
  
  What you really want to be talking about is a different Windows 7 feature called ReadyDrive, which actually does what the author is talking about. Basically, the system heuristically determines what files are used most often/during boot and the BIOS read- and write-caches them to the flash in the ReadyDrive. I bought a Thinkpad with a "4GB Intel TurboBoost Memory" chip and it made a noticeable difference in boot time when I enabled it as a ReadyDrive.
  I also found that when Windows (rarely) crashed, it w
Thread summary (Score:3, Funny)

by Goaway ( 82658 ) writes: on Thursday April 22, 2010 @06:25PM (#31946904) Homepage

"If Linux doesn't already do it, you don't need it anyway!"

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by EvanED ( 569694 ) writes:
  
  No kidding. It's threads like this (where I think the question is entirely reasonable and a good thing to support) that really sour my opinion of Linux. There are a few other things -- better file-system-supported metadata, transactional filesystems, etc. -- that have come up in the past too where it seems I just flat out disagree with most hardcore Linux users.
  (Don't worry, I hate Windows too, but for mostly different reasons. I don't use OS X very often and don't have an opinion on it, but I'd probably ha
Not necessarily a good metric (Score:2)

by StikyPad ( 445176 ) writes:

Caching is only worthwhile if the data can benefit from higher bandwidth. I don't want, for example, my porn or SETI@home data using valuable cache space regardless of how frequently it's accessed, because it can't be processed at anything approaching the bandwidth of magnetic storage, let alone a good SSD. I'd much prefer to have my app/games stored on the SSD, because regardless of how infrequently I use any one of them, the performance gains would be far more dramatic.
FSCache would work except... (Score:5, Interesting)

by Jah-Wren Ryel ( 80510 ) writes: on Thursday April 22, 2010 @08:02PM (#31948370)

I have a similar problem and I tried the FSCache approach:
I've got two raids.
One is optimized for big ass files read contiguously and has raid6 redundancy.
The other is a much smaller JBOD that I can reconfigure via mdraid to anything that linux supports in software.
The problem is that 5% of the big ass files need read-only random access and that kills throughput for anything else going on. It takes me down from ~400MB/s to 15MB/s.
So, I thought I'd use the FSCache approach and use the JBOD as the cache.
I did an NFS mount over loopback and pointed the fscache to the JBOD.
It worked great got practically full throughput for contiguous access, for about 10 hours and then crashed the system.
Apparently NFS over loopback is well known to be broken in linux and has been since, essentially, forever.
I was stunned, it had never even occurred to me that NFS over loopback would be broken. Its freaking 2010 - that something I had been using on Sun0S 3 a bazillion years ago didn't work on linux today had not even entered my mind.
I've also tried replicating the files from the raid6 to the jbod, but that quickly turned into a hassle keeping everything syncronized between the files on disk and the applications that create the files on the raid6 and the apps that use the files on the JBOD. Plus, it doesn't scale out past the size of the JBOD, which I also ran into.
So now, I'm looking at putting the apps that need random access reads to the data in a VM and NFS mounting it with cache to the VM hoping to avoid the NFS-broken-over-loopback problem. I haven't had time to implement it yet, and personally and leery of doing so since I have to wonder what new "known-broken" problems will bite me in the ass.
So, if there is a better way, I am dying to hear it, unfortunately solaris/freebsd is not an option...

Share
twitter facebook
- Re: (Score:2)
  
  by Unit3 ( 10444 ) writes:
  
  Err, no you don't. That's not caching at all, and doesn't help with datasets that don't fit on the SSD.
  This is a shortsighted kludge with limited uses, and not at all the elegant solution the poster was asking for.
  - - Re:Sure it's caching. And it's not a "kludge" at a (Score:2)
      
      by TheRaven64 ( 641858 ) writes:
      
      Sure it's caching. You're storing frequently-used data on a faster medium.
      That's not what caching means. It comes from the French word for 'hidden' and the fact that it is not directly addressable is the important part of the definition. A cache is not just a faster medium, it's a faster medium that is hidden from the user / programmer and is used to accelerate access to the slower medium.
- Re: (Score:3, Insightful)
  
  by Unit3 ( 10444 ) writes:
  
  Define "unnecessarily". Given current SSD costs and depletion rates, it's probably completely acceptable to replace an SSD used as an intermediary cache in front of a large spindle-based array every couple of years.
  Just because it's not useful to you, doesn't mean it's not useful.
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  I don't understand what you mean. How does caching data on an SSD lower the lifespan on the magnetic drive? Or did you mean it lowers the lifespan of the SSD? Which it would do, but it shouldn't be any more than any other use of an SSD. SSD lifespan is really not an issue any longer. (Intel claims over 10 years on their drives. Other manufacturers are claiming similar timespans).
  Could you clarify?
- Re:Counter-Productive (Score:4, Informative)
  
  by pwnies ( 1034518 ) writes: <j@jjcm.org> on Thursday April 22, 2010 @06:20PM (#31946838) Homepage Journal
  
  Sadly, nowadays this is a myth. Current MLC and SLC SSD's have (on average) 10,000 and 100,000 writes (respectively) before any bitwear will occur. While this number is small, remember that all modern mainstream SSD's have wear leveling algorithms built into the controller. Intel rates their drives' minimum useful life at 5 years [pdf link - page 10] [intel.com], with an estimated life of 20 years. Note that this number is based on 20GB of writes per day, every day. SSD's nowadays will have no problems with acting as a cache for the system.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by TheRaven64 ( 641858 ) writes:
    
    And if you're only using it for cache, who cares if it wares out in a few years? Your data is all safely on the other disk(s) and in a few years the SSD won't be worth much anyway.
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  I agree, and I've been thinking along these same lines. Existing caching algorithms are not sufficient for this purpose. Iin-memory caches, they attempt to write-back the data as soon as possible since the memory is volatile. You would want an algorithm specifically made for non-volatile caching.
  I imagine a 500GB hard drive with 32GB of SSD. The caching algorithm would be smart enough to keep 2 cache areas: 16GB reserved for long-term read-only things like OS files. No matter what disk thrashing goes o
- Good idea, lousy implementation. (Score:2)
  
  by SanityInAnarchy ( 655584 ) writes:
  
  Kind of like the current idea of pushing the wear-leveling back to the drives. This is something the OS can do, and it's a case where flexibility matters -- it's not something I want in a black box inside a drive controller.
- Re: (Score:2)
  
  by MobyDisk ( 75490 ) writes:
  
  Laptops can only fit a single drive.
  - Re: (Score:2)
    
    by countach ( 534280 ) writes:
    
    A lot of people replace the DVD drive with a 2nd drive. There are kits available.
  - Re: (Score:2)
    
    by owlstead ( 636356 ) writes:
    
    Yeah, but this is about a *software* implementation using an SSD as cache for one or more HDD. So there are two drives by definition.
- Re: (Score:2)
  
  by drsmithy ( 35869 ) writes:
  
  He wants the OS to intelligently (and automatically) use an SSD to store the frequently used files from his larger spinning hard disk. It's a great idea and surely Windows will do it soon enough (as much as I hate to say it).
  It basically does already (ReadyBoost). I can't imagine there would be much work involved in modifying it to use arbitrary disks like SSDs instead of just thumbdrives. Indeed, I wouldn't be at all surprised if there's a simple Registry setting that decides what devices can and can't
- Re: (Score:2)
  
  by GuruBuckaroo ( 833982 ) writes:
  
  Throw the SSD in a USB enclosure and try ReadyBoost [wikipedia.org] if you're using Windows Vista SP1, 7, or 2008 Server. Not sure if that will work or not, but you can try it. Works with Flash drives.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I don't get it (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re:Wrong. Swap often acts as a cache. (Score:5, Informative)

Re:Wrong. Swap often acts as a cache. (Score:4, Informative)

Re:Wrong. Swap often acts as a cache. (Score:5, Informative)

Re: (Score:2, Offtopic)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2, Informative)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I don't get it (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I don't get it (Score:4, Interesting)

Re: (Score:2)

Re:I don't get it (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re:I don't get it (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

isn't 40 GB enough for applications? (Score:4, Interesting)

Re:isn't 40 GB enough for applications? (Score:5, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

ZFS (Score:5, Informative)

Buffers? (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re:Buffers? (Score:4, Informative)

ZFS L2ARC (Score:5, Informative)

Re: (Score:2, Interesting)

OSDI (Score:2)

bcache (Score:5, Informative)

Re:bcache (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Waste of time (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Replace the optical drive. (Score:2)

Duh (Score:2)

Go for Hardware implemented Caching (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Working on this (Score:2)

But why? (Score:2)

People forgot the low-level Linux stuff quickly. (Score:3, Informative)

dm-cache (Score:3, Informative)

I was just wondering the same thing (Score:2)