Software SSD Cache Implementation For Linux? 297
Annirak writes "With the bottom dropping out of the magnetic disk market and SSD prices still over $3/GB, I want to know if there is a way to to get the best of both worlds. Ideally, a caching algorithm would store frequently used sectors, or sectors used during boot or application launches (hot sectors), to the SSD. Adaptec has a firmware implementation of this concept, called MaxIQ, but this is only for use on their RAID controllers and only works with their special, even more expensive, SSD. Silverstone recently released a device which does this for a single disk, but it is limited: it caches the first part of the magnetic disk, up to the size of the SSD, rather than caching frequently used sectors. The FS-Cache implementation in recent Linux kernels seems to be primarily intended for use in NFS and AFS, without much provision for speeding up local filesystems. Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"
I don't get it (Score:2)
Linux caches data from any disks all the same, SSD or not.
Re: (Score:2)
Linux caches disk data in memory. The author wants to cache disk data in an SSD.
Re: (Score:2)
Re: (Score:2, Informative)
No. Swap is not a cache. Swap holds things that don't fit in RAM. I/O cache will never hit swap, it limits itself to physical RAM.
Re:Wrong. Swap often acts as a cache. (Score:5, Informative)
The way DragonFly's swapcache works is that VM pages (cached in ram) go from the active queue to the inactive queue to the cache (almost free) queue to the free queue. VM pages sitting in the inactive queue are subject to being written out to the swapcache. VM pages in the active queue (or cache or free queues) are not considered.
In otherwords, simply accessing cacheable data or meta-data from the hard drive does not itself trigger writing to the SSD swapcache. It's only when the cached VM pages are pushed out of the active queue due to memory pressure and are clearly heading out the door when DragonFly decides to write them to the SSD.
This prevents SSD write activity from interfering with the operation of the production system and also tends to do a good job selecting what data to write to the SSD when and what data not to. A file which is in constant use by the system just stays in ram, there's no point writing it out to the SSD.
With respect to deciding what data to cache and what data not to, with meta-data its simple. You cache as much meta-data as you can because every piece of meta-data gives you a multiplicative performance improvement. With file data it is harder since you don't want to try to cycle e.g. a terrabyte of data through a 40G swapcache. The production system's working data set at any given moment needs to either fit in the swapcache or you need to carefully select which directory topologies you want to cache.
-Matt
Re:Wrong. Swap often acts as a cache. (Score:4, Informative)
OS's have traditionally discarded clean cache data when memory pressure forces the pages out. Swap traditionally applied only to dirty anonymous memory (The OS needs to write dirty data somewhere, after all, and if it isn't backed by a file then that is what swap is for).
However in the last decade traditional paging to swap has fallen by the wayside as memory capacities have increased. Most of the data in ram on systems today is clean data, not dirty data, and most of the dirty data is backed by a file (e.g. write()s to a database or something like that). On most systems today if you look at swap space use you find it near zero.
But the concept of swap can trivially be expanded to cover more areas of interest. tmpfs (tmpfs, md, mfs, etc) is a good example. For that matter anonymous memory for VMs can be backed by swap. It is very desireable to back the memory for a VM with either a tmpfs-based file or just straight anonymous memory instead of a file in a normal filesystem. That is a good use for swap too.
It isn't that big a leap to expand swap coverage to also cache clean data. It took about two weeks to implement the basics on DragonFly. Those operating systems which don't have this capability will probably get it as time goes on simply because it is an extremely useful mechanic for interfacing a SSD-based cache into a system. It is also probably the cleanest and simplest way to implement this sort of cache, and it pairs up well with the strengths of the SSD storage mechanic. Since you can reallocate swap space when something is rewritten there are virtually no write amplification effects and the storage on the SSD is cycled very nicely. You get much better wear leveling than you would if you tried to map a normal filesystem (or mirror the blocks associated with a normal filesystem) on top of the SSD.
-Matt
Re:Wrong. Swap often acts as a cache. (Score:5, Informative)
Solaris certainly doesn't. What developer would ever code this kind of behavior? Non-dirty filesystem data in the cache is already on disk, what would be the rational to write it out to another part of the disk? That's just stupid. Non-dirty pages are thrown away when RAM is in demand. Dirty filesystem data is just written to disk. Then the pages become non-dirty and can be freed at any time. Possibly immediately if there is demand.
Scenario A:
1. File is read and data is copied into system memory where is it buffered. Time passes.
2. Memory usage skyrockets.
3. Kernel writes data to swap space and frees the memory for use by other processes.
4. Later an application wants that data. Kernel reads data from swap space.
Scenario B:
1. File is read and data is copied into system memory where is it buffered. Time passes.
2. Memory usage skyrockets.
3. Kernel locates non-dirty cached data and frees that page for use by other processes.
4. Later an application wants that data. Kernel reads data from original file on disk.
Differences between scenario A & B:
Scenario A has two disk IOs (steps 3&4) during memory pressure. Scenario B has one (step 4).
Scenario A uses limited swap space to store duplicate data. Scenario B doesn't.
And no, Solaris doesn't cache slow devices (tape, dvd-rom, etc.) either. If you choose to access those types of devices, that is your choice. The OS isn't going to save your ass. If you want it cached, make your application do the caching.
Also, I'm not considering special purpose systems such as ZFS's l2arc or other similar/more generalized systems that utilize SSD as a midway point between RAM and HDD. We're talking generic swap space and filesystem caches.
Re: (Score:2, Offtopic)
...only if you want to blow out the SSD wear-limits.
What the author wants (I believe) is to have Linux figure down which sectors are read most frequently, and have those mapped/linked/whatever to the SSD for speed reasons.
If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.
Re: (Score:2)
Re: (Score:2)
Probably not. That would be bad if for example you wanted to burn a DVD and the burner program put a lot of stuff in /tmp. I'm not a linux pro or anything so I don't know how different distros do it but I don't think that's the default.
Re: (Score:2)
Well, if you use tmpfs and not ramdisk for /tmp, then pages will be swapped to disk if needed, thus you can burn you DVD as long as you have enough swap available and damons like swapd or swapspace allow you to have reasonable size swap partition and then will create swapfiles by demand.
Re: (Score:2)
Correct me if I'm wrong, but isn't /tmp usually mapped to a ramdisk?
Depending on the distribution, but sometimes.
On servers /tmp can get pretty big with random crap, though, so generally you want to be able to put it on a disk or allow it to swap out and use your RAM for something more useful.
But on thin clients, netbooks, etc. without too much going on it might be better to put it on tmpfs to reduce SSD wear.
Re: (Score:2)
If that's indeed the case, then why not simply put the MBR, /boot, /bin, and /usr on the SSD, then mount stuff like /home, /tmp, swap, and the like onto a spindle disk? No algorithm needed, thus no overhead needed to run it, etc.
Unfortunately, it usually works out that some of the most volatile places on disk (/home /tmp swap) are the very places one would see the best result in speeding up.
Also unfortunately those are the worst uses currently for a SSD
Then again, for anyone who really wants to speed up things like swap and /tmp, the best way is to simply quadruple your ram and get rid of swap, and use tmpfs in ram for /tmp.
The usual reason for not doing that is ram is expensive, and on top of that motherboards to handle a ton of r
Re: (Score:2)
Re: (Score:2)
Because all of /bin is hardly going to be your most used stuff, and there's probably a ton of stuff frequently used that isn't in /bin, /usr.
Sure, you can try and mount your most used stuff on SSD, but that's (a) a pain in the neck to fiddle around with (b) something ideally better left to an algorithm. (c) doesn't actually work that well, since you have to divide all your most used stuff into separate file systems.
Re: (Score:2)
The best solution then is a physical ramdisk/ramdrive. The capacity isn't huge(8-16GB as a rule), but the speed is easily equal to any SSD and you can beat on it forever without worrying about it running out of write cycles.
Re: (Score:2)
Just make a tmpfs, ram is cheap as hell.
Re: (Score:2, Informative)
Assuming the SSD was faster at both read and write - it should speed things up. Hell just moving the swap onto a different physical disk helps. But don't. SSD have a limited life, in a different sense to spinning disks. SSD wear with writing, so if you constantly write to the same "sectors" they will fail. If you think about what's happening when the system is swapping - that's exactly what's going on. So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM,
Re: (Score:2, Informative)
2006 called, they want their FUD back. While it's true that erase blocks in flash memory wear out with use, the whole battle between SSD manufacturers for the last couple years has been in mapping algorithms that ensure you don't hit the same erase block very often. By now, SSDs have longer lifetimes than HDDs. Of course that applies to real SSDs, not makeshift IDE-to-CompactFlash adapters.
Re: (Score:2, Informative)
So yes, it'll help (a bit) but it's really expensive given what will happen to the SSD. Better is add RAM, so the system won't need to swap (with enough RAM you don't need swap at all).
A RAM buffer cache and SSD cache address far different issues. The buffer cache is far faster when it hits, but the SSD cache is far larger. It's pretty easy to find workloads where getting enough RAM so that your working set will fit into your buffer cache (alongside the memory use of whatever you're doing) would be more ex
Re: (Score:2)
Sure, adding RAM isn't a panacea, but running "san swap" can really speed (some) things greatly. The question I was addressing was all about swap - nothing else.
I do think running swap in SSD is **probably** a bad idea, especially if you can put enough RAM in to not need swap. But sure that is a pretty glib statement...
Re: (Score:2)
Re: (Score:2)
I'm not going to answer that - but to give you an idea I do have a NeXT Dimension ...
But I take your point.
Re: (Score:2)
The early versions did, but now you have real developers with real support entering consumer space.
eg. http://eshop.macsales.com/shop/internal_storage/Mercury_Extreme_SSD_Sandforce [macsales.com]
10,000,000 Mean Time Before Failure (MTBF) and a 5 year warranty.
Re: (Score:2)
This is about doing double caching: cache to fast but limited RAM (L1) first and then have a much larger but slower cache, that being the SSD (L2). Difference being with other caching systems that the SSD of course holds state if power is down (so often use sectors may never be written do disk).
Re:I don't get it (Score:5, Insightful)
so
CPU L1
CPU L2
CPU L3
RAM
SSD
DISK
NETWORK
Internet
I estimate SSDs would be closer to Level 5 cache.
Re: (Score:2)
I'd argue it's better to implement as HSM (Hierarchial Storage Management), with least recently used things getting delegated to more archival storage. It would be nice with a device-mapper-hsm layer that would let you simply stack one device upon the other and obtain the best distribution of desireable characteristics they could offer.
IIRC, there was an intern at IBM who did a project like that some years ago, but I don't think much became of it.
Re: (Score:2)
<facepalm :-)
If you want to play Mr. Pedantic, you skipped registers. And CPU cache may not necessarily cache disk data, so those don't count for the same reason registers don't count. Networks don't cache the internet. And don't forget newspapers - the Internet caches those. And newspapers cache events, which cache time.
The point is that most of the layers you listed are implementation dependent or not relevant to the discussion. For this purpose, the CPU is a black box - it could have different
Re: (Score:2)
OK, but CPU L1/2/3 is a data cache. Of course it will help but it's just not configured as a disk cache. Multi-processor systems for instance would not benefit from the CPU caches.
Furthermore, I don't know about you, but my disk is certainly not used for network or internet cache. For network resources there is no such thing as configuring life-time and the browser disk cache is the first thing I disable (using a spinning system disk for internet cache is stupidity IMHO).
Besides all that, it's just an examp
Re:I don't get it (Score:4, Interesting)
A fast SSD is not 1 Gb/s under ideal conditions. A fast SSD is up to 2 Gb/s (about 250 MB/s) under real life conditions (while reading). Anyway, it still makes sense to cache network content to disk if the other side of the connection is slow or not reliable.
Re: (Score:2)
Yeah, especially if those sectors don't change much - an SSD isn't suitable for data that's rapidly changing.
Re:I don't get it (Score:5, Informative)
The idea is to use the SSD as a second-level disk cache. So instead of simply discarding cached data under memory pressure, it's written to the SSD. It's still way slower than RAM, but it's got much better random-access performance characteristics than spinning rust and it's large compared to RAM.
As for how to do it in Linux, I'm not aware of a way. If you are open to the possibility of using other operating systems, this functionality is part of OpenSolaris (google for "zfs l2arc" for more information).
You can also use it as an intent log, which can dramatically improve write performance:
Re: (Score:2)
Re: (Score:2)
What about this: the SSD Ram Disk (SSDRD). It's exactly like a normal RAM disk, but it simulates an SSD. It would be supremely faster to write to an imaginary SSD rather than an imaginary HD.
Patent!
Re:I don't get it (Score:5, Informative)
Re: (Score:2)
Indeed, this is the 1 killer feature of ZFS that btrfs seems not to have yet.
Re: (Score:3, Informative)
Trick with FBSD - it doesn't believe in removing L2ARC devices yet.
You're wrong:
You're probably thinking of ZIL devices. You can't remove them in FreeBSD, but the version of ZFS in Solaris (that's being ported to FreeBSD right now) supports removing them.
Re: (Score:2)
Submitter's probably looking at this backwards; just put the entire system on the SSD, and create symlinks to large directories hosted on conventional storage instead.
Even very small 32GB SSDs are large enough to fit your entire OS on; then you can use the hard disk for large file storage. So I'd say it's probably not worth the effort to try to collect detailed traces using SystemTap or whatever to figure out which files should go on SSD and which should be relegated to the spindle drive; just put it al
Re: (Score:2)
RAM is cheap, the OS already caches disk to that. Adding ram is more useful and easier.
isn't 40 GB enough for applications? (Score:4, Interesting)
Is there really a need for this? Intel 40 GB SSD still has a read speed of 170 MB/s and costs about 100 euro here in NL. Why have some kind of experimental configuration while prices are like that? OK, 35 MB/s write speed is not that high, but with the high IOPS and seek times you still have most of the benefits.
I can see why you would want something like this, but I doubt the benefits are that large over a normal SSD + HDD configuration.
Re:isn't 40 GB enough for applications? (Score:5, Informative)
They are huge for larger applications. Database servers, for instance, can see performance increases in the magnitude of 10-20x the number of transactions per second when using a scheme like this for datasets that are too large to fit in RAM.
Re: (Score:2, Insightful)
Yeah, but if you've got some 'enterprise-level database' with those sort of transaction requirements, you can probably justify the purchase of SSDs. It's not exactly like you're building that system from craigslist parts...
Re: (Score:2)
Speak for your self.. some companies do not want to spend the money required to do it right.. but rather would have you spend more time than the equipment cost putting something crazy together to make it work.
Re: (Score:2)
Then just use ZFS on solaris or bsd.
You can add the SSDs as cache devices.
Re: (Score:2)
t I doubt the benefits are that large over a normal SSD + HDD configuration.
Which doesn't work for laptops. :-(
Most laptops can only fit a single drive. I would love to have an SSD for faster build times, but a 40GB SSD is useless in my laptop since the second drive would have to be an external. But a 300GB drive with 16GB of integrated flash might give me a single drive with the performance boost that I am looking for.
Re: (Score:2)
As well as faster reads, the biggest gains are in seek times, so it'd be helpful to have your home directory and all it's "dot" config files on there too (especially when starting up something like Gnome or KDE). However, if you're gonna fill your home directory with tons of stuff, then stick your home directory itse
ZFS (Score:5, Informative)
Buffers? (Score:2)
I hate to sound dumb, but isn't what you're describing basically file system buffering that OS's have been doing for many decades now?
Re: (Score:3, Informative)
No.
You would buffer on an SSD differently than your would do it in memory. Memory is volatile, so you write-back to disk as fast as possible. And whenever you cache something, you trade valuable physical memory for cache memory. With an SSD, you could cache 10 times as much data (Flash is much cheaper than DRAM), you would not have to write it back immediately (since it is not volatile), and the cache would survive a reboot so it could also speed the boot time.
Re: (Score:2)
Not really. There are a couple of applications for SSD's. One is to speed up boot times. Obviously a RAM cache is useless in that application.
Another is if you want to speed up a transaction server (one that is writing as much as it is reading), then the answer is again no. Think of the battery backed up RAM cache RAID arrays have. Those caches are there for a reason. RAM can do read caching, but it can't do write caching and still be secure across power failure.
My interest is in the second applicatio
Re:Buffers? (Score:4, Informative)
The single largest problem addressed by e.g. DragonFly's swapcache is meta-data caching to make scans and other operations on large filesystems with potentially millions or tens of millions of files a fast operation. Secondarily for something like DragonFly's HAMMER filesystem which can store a virtually unlimited number of live-accessable snapshots of the filesystem you can wind up with not just tens of millions of inodes, but hundreds of millions of inodes. Being able to efficiently operate on such large filesystems requires very low latency access to meta-data. Swapcache does a very good job providing the low latency necessary.
System main memory just isn't big enough to cache all those inodes in a cost-effective manner. 14 million inodes takes around 6G of storage to cache. Well, you can do the math. Do you spend tens of thousands of dollars on a big whopping server with 60G of ram or do you spend a mere $200 on a 80G SSD?
-Matt
ZFS L2ARC (Score:5, Informative)
Re: (Score:2, Interesting)
Not Linux per se, but the same idea is implemented nicely on ZFS through its L2ARC: http://blogs.sun.com/brendan/entry/test [sun.com]
Swapcache on DragonFly BSD 2.6.x was implemented for this very reason IIRC.
http://leaf.dragonflybsd.org/cgi/web-man?command=swapcache§ion=ANY
OSDI (Score:2)
The OSDI deadline is in August; plenty of time to implement this, write it up, and get a publication at a top research conference out of it!
bcache (Score:5, Informative)
http://lkml.org/lkml/2010/4/5/41 [lkml.org]
I'm a little surprised at the lack of response on linux-kernel.
Solaris and DragonFly have already implemented this feature; I'm surprised that Linux is so far behind.
Re:bcache (Score:5, Informative)
Hey, at least someone noticed :)
That version was pretty raw. The current one is a lot farther along than that, but it's still got a ways to go - I'm hoping to have it ready for inclusion in a few months, if I can keep working on it full time. Anyone want to fund me? :D
Re: (Score:2)
If it were a block device wrapper along the lines of md, I'd be interested.
Have a project at the moment where I'd *love* to be able to specify tiers of storage (say md volumes), and have writes go to the highest priority, and blocks trickle down to the lowest based on usage.
Sort of like a specialized CoW.
Re: (Score:2)
I'm not looking for a limited use pool of cache only disks so much as hierarchical storage management at the block level.
For example you could have a set of 7.2k sata drives, a set of 15k SAS drives, a set of FC drives, and a set of SSDs. Each group would be a tier, which all together would act as a storage pool (each tier probably raided together using MD, as there is no hybrid md/lvm at the moment to sub-devide them gracefully at the lv level instead of pv level), with writes all going to the highest spee
Re: (Score:2)
Any plans to add in write cache support? I'm thinking along the lines of putting ZFS's ZIL on SSD's. Really makes NFS in sync mode much quicker.
Re: (Score:2)
Re: (Score:2)
You shouldn't be surprised that "Linux is so far behind"; we like it that way. If we thought that what the Solaris or DragonFly engineers are doing was important, we'd be using their systems instead.
Re: (Score:2)
I'm surprised that Linux is so far behind
Obviously you are either unfamiliar with Linux, or unfamiliar with all non-Linux operating systems except perhaps Windows and maybe Darwin.
Waste of time (Score:5, Informative)
I personally fit all my partitions except
Re: (Score:2)
Just put /home on a magnetic disk and everything else on the SSD
Try jamming two hard drives into a laptop. :-(
Re: (Score:2)
Re: (Score:2)
How much does it weight? I almost bought an Alienware a few years ago, but they didn't list weight in the specs.
Re: (Score:2)
Actually, my S300 Thinkpad does have a second SATA connector - it's used by the rather useless DVD writer. I've looked everywhere but I cannot find anybody that sells a simple slim-DVD drive bracket. I cannot even find a cable that is suited. There is a site that build their own cable, but the cable that they build it out of is hard to get and I'm not that great at soldering electronics either.
The problem is of course that there are a few different connectors out there (3 to 4 is my current estimation). Fur
Replace the optical drive. (Score:2)
Replace the optical drive. I've been keeping a log of how many times I've ever used my DVD drive while away from home. So far I'm at 1; I ripped a CD I got for Christmas before I brought it back home with me. It could have waited.
Yes, I know it doesn't work for everyone, but I think it works for most people, assuming you get a USB powered optical drive or enclosure.
Duh (Score:2)
Try jamming two hard drives into a laptop.
Re-read the problem as stated:
"Is there a way to use an SSD to act as a hot sector cache for a magnetic disk under Linux?"
Ya think maybe it's assumed that there are two drives? Just Maybe?
Go for Hardware implemented Caching (Score:2, Informative)
Re: (Score:2)
I see that you haven't actually used hybrid hard drives, because they're nothing like what you describe. AFAIK only Samsung ever made them, and they've now been discontinued. The HHD itself didn't perform caching; it relied on Vista to manage the flash, which didn't really work out when no one bought Vista. HHDs also included laughably small (256MB) and slow flash that would get totally owned by the smallest slowest SSD today.
Re: (Score:2)
Re: (Score:2)
Adaptec MaxIQ is a hardware SSD cache; just don't ask what it costs. To do SSD caching you need some DRAM to hold the metadata and a CPU to manage the cache; would you prefer to buy additional CPU/RAM or just use what's already in your computer?
Re: (Score:2)
Re: (Score:2)
Somebody forgot that you cannot create speed by just using a slow flash chip or two. You need speedy chips, and a lot of them, to create a fast SSD. Besides a good controller and software for that controller of course. Besides that, 256 MB is so low that I wonder if RAM would not already perform most of the caching, even for writes.
Working on this (Score:2)
I've actually been working on this off-and-on for a while, I'm hoping we can release some beta code soon. Currently developing it on Linux, but planning to release OSX and Windows versions, too. We're caching reads and writes, and only the blocks that are most frequently used, plus various other SSD-relevant optimisations. The block allocation logic is pretty complex (and I'm too busy with work), which is why it's been taking so long.
But why? (Score:2)
The oldest and simplest solution is to mount partitions from a small fast disk where you want fast read/write speeds, and partitions from slower disks everywhere else. Works quite well, too.
People forgot the low-level Linux stuff quickly. (Score:3, Informative)
First of all, you can do this with ZFS which is newer tech and works quite well but is not (ever going to be) implemented in the Linux kernel
For lower tech, you can do it the same way we used to do back when hard drives were small. In order to prevent people from filling up the whole hard drive we used to have partitions (now we just pop in more/larger drives in the array). /boot and /var would be in the first parts of the hard drive where the drive was fastest. /home could even be on another drive.
You could do the same, put /boot and /usr on your SSD (or whatever you want to be fastest - if you have a X25-E or another fast writing SSD you could put /var on there (for log, tmp etc. if you have a server) or if you have shortage of RAM make it a swap drive. If you have small home folders, you could even put /home on there and leave your mp3's in /opt or so.
dm-cache (Score:3, Informative)
google dm-cache. Not updated since 2.6.29 though.
I was just wondering the same thing (Score:2)
Windows 7 (and I think XP) has ReadyBoost. I haven't been able to find anything similar for Linux. It is also not clear how much difference ReadyBoost makes. The only benchmark I was able to find uses a crappy USB flash drive [pcstats.com]. I was wondering how much difference something like the 80GB x-25m would make. There is clearly potential for huge gains as MaxIQ benchmarks show.
This would be an awesome speedup if it was supported: just add a 40-80GB SSD for swap & file cache, and gain a massive performance boost
Re: (Score:2)
It is also not clear how much difference ReadyBoost makes
Keep searching. There's lots of other benchmarks, and they all same the same thing. It helps if you have an old machine with insufficient memory.
ReadyBoost doesn't really do what the author wants. Windows treats ReadyBoost as a write-through cache like it treats memory. It assumes you might unplug the drive at any moment. It won't speed up boot time, and it won't speed up writes. It won't place the swap file on there either. I'm not sure if you could tell Windows to use a regular SATA drive for Ready
Re: (Score:2)
Re: (Score:2)
What you really want to be talking about is a different Windows 7 feature called ReadyDrive, which actually does what the author is talking about. Basically, the system heuristically determines what files are used most often/during boot and the BIOS read- and write-caches them to the flash in the ReadyDrive. I bought a Thinkpad with a "4GB Intel TurboBoost Memory" chip and it made a noticeable difference in boot time when I enabled it as a ReadyDrive.
I also found that when Windows (rarely) crashed, it w
Thread summary (Score:3, Funny)
"If Linux doesn't already do it, you don't need it anyway!"
Re: (Score:2, Insightful)
No kidding. It's threads like this (where I think the question is entirely reasonable and a good thing to support) that really sour my opinion of Linux. There are a few other things -- better file-system-supported metadata, transactional filesystems, etc. -- that have come up in the past too where it seems I just flat out disagree with most hardcore Linux users.
(Don't worry, I hate Windows too, but for mostly different reasons. I don't use OS X very often and don't have an opinion on it, but I'd probably ha
Not necessarily a good metric (Score:2)
Caching is only worthwhile if the data can benefit from higher bandwidth. I don't want, for example, my porn or SETI@home data using valuable cache space regardless of how frequently it's accessed, because it can't be processed at anything approaching the bandwidth of magnetic storage, let alone a good SSD. I'd much prefer to have my app/games stored on the SSD, because regardless of how infrequently I use any one of them, the performance gains would be far more dramatic.
FSCache would work except... (Score:5, Interesting)
I have a similar problem and I tried the FSCache approach:
I've got two raids.
One is optimized for big ass files read contiguously and has raid6 redundancy.
The other is a much smaller JBOD that I can reconfigure via mdraid to anything that linux supports in software.
The problem is that 5% of the big ass files need read-only random access and that kills throughput for anything else going on. It takes me down from ~400MB/s to 15MB/s.
So, I thought I'd use the FSCache approach and use the JBOD as the cache.
I did an NFS mount over loopback and pointed the fscache to the JBOD.
It worked great got practically full throughput for contiguous access, for about 10 hours and then crashed the system.
Apparently NFS over loopback is well known to be broken in linux and has been since, essentially, forever.
I was stunned, it had never even occurred to me that NFS over loopback would be broken. Its freaking 2010 - that something I had been using on Sun0S 3 a bazillion years ago didn't work on linux today had not even entered my mind.
I've also tried replicating the files from the raid6 to the jbod, but that quickly turned into a hassle keeping everything syncronized between the files on disk and the applications that create the files on the raid6 and the apps that use the files on the JBOD. Plus, it doesn't scale out past the size of the JBOD, which I also ran into.
So now, I'm looking at putting the apps that need random access reads to the data in a VM and NFS mounting it with cache to the VM hoping to avoid the NFS-broken-over-loopback problem. I haven't had time to implement it yet, and personally and leery of doing so since I have to wonder what new "known-broken" problems will bite me in the ass.
So, if there is a better way, I am dying to hear it, unfortunately solaris/freebsd is not an option...
Re: (Score:2)
Err, no you don't. That's not caching at all, and doesn't help with datasets that don't fit on the SSD.
This is a shortsighted kludge with limited uses, and not at all the elegant solution the poster was asking for.
Re:Sure it's caching. And it's not a "kludge" at a (Score:2)
Sure it's caching. You're storing frequently-used data on a faster medium.
That's not what caching means. It comes from the French word for 'hidden' and the fact that it is not directly addressable is the important part of the definition. A cache is not just a faster medium, it's a faster medium that is hidden from the user / programmer and is used to accelerate access to the slower medium.
Re: (Score:3, Insightful)
Define "unnecessarily". Given current SSD costs and depletion rates, it's probably completely acceptable to replace an SSD used as an intermediary cache in front of a large spindle-based array every couple of years.
Just because it's not useful to you, doesn't mean it's not useful.
Re: (Score:2)
I don't understand what you mean. How does caching data on an SSD lower the lifespan on the magnetic drive? Or did you mean it lowers the lifespan of the SSD? Which it would do, but it shouldn't be any more than any other use of an SSD. SSD lifespan is really not an issue any longer. (Intel claims over 10 years on their drives. Other manufacturers are claiming similar timespans).
Could you clarify?
Re:Counter-Productive (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
I agree, and I've been thinking along these same lines. Existing caching algorithms are not sufficient for this purpose. Iin-memory caches, they attempt to write-back the data as soon as possible since the memory is volatile. You would want an algorithm specifically made for non-volatile caching.
I imagine a 500GB hard drive with 32GB of SSD. The caching algorithm would be smart enough to keep 2 cache areas: 16GB reserved for long-term read-only things like OS files. No matter what disk thrashing goes o
Good idea, lousy implementation. (Score:2)
Kind of like the current idea of pushing the wear-leveling back to the drives. This is something the OS can do, and it's a case where flexibility matters -- it's not something I want in a black box inside a drive controller.
Re: (Score:2)
Laptops can only fit a single drive.
Re: (Score:2)
A lot of people replace the DVD drive with a 2nd drive. There are kits available.
Re: (Score:2)
Yeah, but this is about a *software* implementation using an SSD as cache for one or more HDD. So there are two drives by definition.
Re: (Score:2)
He wants the OS to intelligently (and automatically) use an SSD to store the frequently used files from his larger spinning hard disk. It's a great idea and surely Windows will do it soon enough (as much as I hate to say it).
It basically does already (ReadyBoost). I can't imagine there would be much work involved in modifying it to use arbitrary disks like SSDs instead of just thumbdrives. Indeed, I wouldn't be at all surprised if there's a simple Registry setting that decides what devices can and can't
Re: (Score:2)