Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Linux Software

Minimum Seek Hard Disk Drivers for Unix? 58

Jonathan Andrews asks: "I remember back in the old days reading about a filesystem/device driver that had alomost no seeks of the physical disk. It worked by scanning the heads of the disk from track 0 to the end then back again in a constant motion. Disk reads and writes where cached so that they got written to disk only when the heads where on that part of the platter. My question is simple, now that disks are IDE, have lots of heads and even worse differing Heads/Cylander/Sector translation scemes is this type of system even possible? Would you have to fight the disk cache on the drive? I seem to recall it giving real throughput advantages, if the cache was large enough to hold 'one sweep times worth of data' then the cache almost never blocked and disk writes/reads sustained at the max throughput all the time. Best of all it gets rid of that blased seeking, chewing, seeking noise!"
This discussion has been archived. No new comments can be posted.

Minimum Seek Hard Disk Drivers for Unix?

Comments Filter:
  • by ComputerSlicer23 ( 516509 ) on Monday March 03, 2003 @08:09PM (#5428577)
    A lot of mission critical applications block on writes. Take a look at a mail server, or a database server. They call fsync an a lot. You don't want that fsync call to return until after the data is written to the filesystem. So it's good for thruput on those systems to ensure the system queues writes that are being waited on by fsync get done really soon now. So seeking directly to where they need to go to write is important for performance.

    The other problem is that all applications block on reads. So when an application wants to read, it can't do anything else until the blocks are read off disk. So there is contention to get reads done quickly. As a general rule another read is going to happen very soon very near that, it might be earlier, it might be later. Linus proposed an interesting optimization to this at on the LKML suggesting that you create an mmap call that says, I'm interested in this page of this file, please start putting it here. Then go off and do other work, and when you access that page, if it isn't yet loaded from disk, the pointer access blocks until it is available. Then these types of reads are less pressing the reads which are blocking progress of a process.

    The reason most people changed the scheduler is because it is faster for a known workload that is important. So they got tweaked out. Buffering a full write cycle is easy, but the fsync means some writes are more important then others. Reads are almost always blocking a process. So seeking is important. Now throw mirrors into the mix. Where having two copies means you can have two independant heads searching the same information. Oh, but they have to sync everytime there is a write.

    The other thing is that prior to the Linux 2.5 kernel, Linux had spent all it's time assuming that a disk had no C/H/S, it was a big flat row of sectors. It explicitly ignored the C/H/S for filesystems and disk head algorithms. 2.5 is supposed to introduce the concept of a spindle, and spindles to have C/H/S, and then they can be optimized for.

    Now remember that most of write concerns will become unimportant as soon as battery backed up becomes a standard part of every computer. Then what will happen, is everyone will run journaling filesystems where the journal is stored on the battery backed up RAM (so there are no seeks for writes), then you only queue writes when there is pressure on the battery backed up ram, and try and do reads as fast as possible. When the battery backed up ram is nearly full, you start from the inside moving outside, writting every track as you go. Then you can use n-way mirriors to speed up reads.

    Kirby

  • Re:Lag! (Score:2, Informative)

    by Orthanc_duo ( 452395 ) <forum@orthanc.cGAUSSo.nz minus math_god> on Monday March 03, 2003 @08:36PM (#5428822) Homepage Journal
    Without Registers and L2 cache modern Ram wouldn't have a hope... Without a cache you disk access would be unberable.
  • by deviator ( 92787 ) <bdp&amnesia,org> on Monday March 03, 2003 @10:52PM (#5429860) Homepage
    The technology the poster is referring to is called "Elevator Seeking" and was originally included in early versions of Novell Netware (and is still in current versions, as it really does improve access time.) Here's Novell's official definition. [novell.com]
  • Geometry (Score:5, Informative)

    by mmontour ( 2208 ) <mail@mmontour.net> on Monday March 03, 2003 @11:07PM (#5429947)
    My question is simple, now that disks are IDE, have lots of heads and even worse differing Heads/Cylander/Sector translation scemes is this type of system even possible?

    I wouldn't say that disks have lots of heads - 2 to 4 is probably typical, and there are models that only have 1 (e.g. Maxtor "Fireball 3" series).

    Don't pay any attention to the BIOS geometry that claims 16 heads - those numbers are pure fiction. In fact the whole concept of the C/H/S geometry is obsolete. It assumes that the number of sectors per track is constant, when in fact it varies. Outer tracks have a larger circumference, so they can hold more sectors. A drive might have 30 different zones, each with a different number of sectors per track.

    Tools like "zcav" (bundled with the "bonnie++" benchmark utility) will show this quite clearly, because the sustained transfer rate of a disk is proportional to the number of sectors per track. The rotation rate is constant, so more sectors-per-track means more sectors-per-second passing under the head.

    Drives tend not to expose the information you need to translate logical block addresses into physical locations. Apart from the zones, you also have the problem of re-mapped sectors (where a spare physical sector is substitued at the logical address of a failed one).

    p.s. "drivers/block/elevator.c" in the Linux kernel might be of interest.
  • by mmontour ( 2208 ) <mail@mmontour.net> on Monday March 03, 2003 @11:28PM (#5430052)
    There are some nice-looking battery-backed RAM cards here [micromemory.com], but I've never used them and don't know how much they cost. Available in capacities of 128M to 1G, with two redundant Li-ion batteries and a 64-bit PCI interface.
  • Done. (Score:5, Informative)

    by leonbrooks ( 8043 ) <SentByMSBlast-No ... .brooks.fdns.net> on Tuesday March 04, 2003 @03:22AM (#5431059) Homepage
    Go read elevator.c from the Linux kernel. If you have kernel sources installed, file:///usr/src/linux/drivers/block/elevator.c might work for you (slashdot butchers that URL).
  • by tlambert ( 566799 ) on Tuesday March 04, 2003 @03:39AM (#5431140)
    First use of this technology... 1984:

    A Fast File System for UNIX (1984)
    Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, Robert S. Fabry
    http://citeseer.nj.nec.com/mckusick84fast.h tml ...otherwise known as BSD FFS.

    Current BSD's have this capability, but it is generally disabled, because modern disk drives *lie* about their physical seek boundaries.

    Theoretically, you can work around this with SCSI disks by reading the physical geometry off of mode page 2, and then taking it into account when laying out data, to avoid seeks. Maxtor also has a vendor private command for getting this information from some of the more modern ATA drives.

    The BSD FFS code can't handle this information without work, though, because the code is very simple, and supports only the idea of uniform length tracks, and does simple math, rather than a table lookup (but it's not that hard to change).

    Practically, you could expect a significant speedup, now that the relative spindle speed vs. seek speed makes seeks significant on 10,000 RPM drives; for most drives for the last 5-6 years, though, the seek is in the noise, and it's not that big a win (stepper moters vs. voice coils was the big change that made it matter much less, the first time).

    -- Terry
  • by one-egg ( 67570 ) <geoff@cs.hmc.edu> on Tuesday March 04, 2003 @03:50AM (#5431172) Homepage
    Contrary to what other posters have said, the original questioner was not asking about disk scheduling algorithms such as SCAN (elevator) or C-SCAN. Rather, the system he was recalling was the Log-Structured Filesystem [berkeley.edu] (LFS) from Berkeley. The original work was done in the early 90's. The basic ideas were as follows:
    1. Most disk activity is reads.
    2. If you have a lot of RAM, caches do a good job of taking care of the reads.
    3. The leftover writes drive the head crazy.
    4. It is therefore a Good Idea to do the writes whereever the head happens to be, and let the disk be scrambled.
    The LFS operated by creating a "log" in which all blocks were written sequentially. Reads required random seeks, but the cache was supposed to take care of that. Eventually the log filled, after which a cleaner (which ran in background) would recover the blocks discarded by deletes, and those blocks would be reused.

    The original work was done by Mendel Rosenblum, one of the founders of VMware and the most recent (2002) winner of the ACM SIGOPS Mark Weiser award.

    The problem, as it turned out, was the cleaner. It put too much load on the disk. The original theory was that the cleaner would run overnight, but on a continuously loaded system there was never idle time to use to run it.

    In 20/20 hindsight, the idea was clearly flawed. If you look at my list above, you'll see that you are getting rid of scrambled writes by giving up sequential reads. Since reads are cached, you're (on average) giving up 1 approximately sequential read to get 1 sequential write. But that's wrong because occasionally the cache misses, so instead you give uyp 1.1 (or 1.001) sequential reads to earn 1 sequential write. Worse, you also have to pay overhead to the cleaner.

    I can argue strongly that the only reason LFS ever saw the light of day was that the benchmarks used to evaluate it wound up highlighting its strengths and hiding its weaknesses. I don't think that was intentional, but it's what happened.

    The most recent LFS work was by Drew Roselli, in the late 90's. She identified a lot of the causes of slowdowns in the original system, and found ways to mitigate them. Even so, though, the system has never lived up to its promise.

    BTW, don't confuse LFS with journaling filesystems such as ReiserFS, XFS, and ext3. LFS had some journaling aspects, but its focus was performance rather than crash-proofing. One can argue that LFS influenced journaling filesystems, but it's not the same.

  • by dotgain ( 630123 ) on Tuesday March 04, 2003 @04:02AM (#5431212) Homepage Journal
    Nothing new about it. My SunSparc 20 has 2MB of non-volatile RAM. Well, actually it doesn't any more beccause Linux doesn't do anything with it (yet). I don't even think Solaris on its own either, you need to use some Storage Management crappy thing.

    So I yanked it out and turned the battery off. But my point is, it's been around for ages, and if I could use it I would.

  • by Detritus ( 11846 ) on Tuesday March 04, 2003 @05:25AM (#5431419) Homepage
    The other problem is that all applications block on reads.

    That depends on the operating system. Operating systems with proper asynchronous I/O support allow applications to issue non-blocking reads. I used to make extensive use of non-blocking reads and writes when I wrote applications for DEC operating systems.

  • by ComputerSlicer23 ( 516509 ) on Tuesday March 04, 2003 @12:43PM (#5433470)
    That's true. I felt like giving up on my post before I got to covering that subject. Async I/O is in most (all?) Unices. The other way to handle that case is to use threads if kernel threads are supported on the O/S. Most people I know generally don't use Async I/O, because it's much harder to use, and it's harder to successfully organize your programs so it works out very well for you. Just one of those things.

    Kirby

  • I remember BSD 4.3 (Score:2, Informative)

    by lpq ( 583377 ) on Tuesday March 04, 2003 @04:04PM (#5435301) Homepage Journal
    It wasn't something so mindless as an elevator that can only go one direction...They had a concept of "zones".

    The theory is you divide up the disk into cylinders or groups of cylinders (depending on your cylinder size). When you write files out to disk, the directory and the files for the directory were all aimed at the same zone or a closest adjacent/available zone.

    The idea was to use people's logical mapping as a guideline in physical layout on the disk. That way seeking is minimized within some small number of cylinders.

    You, maybe, have to hold off on writes to allow them to group to be efficient, or worst case writes involve much seeking if different processes are writing single blocks to different directories with each write, but reads go much faster...

    Maybe you would wait around after a read at the same area of the disk while the process that was just 'awakened' by the finishing of the disk I/O finishes it's timeslice. If it doesn't get scheduled next or isn't likely to within "X" time, or if one accumulates other read requests that are more than "Y" time old, then you services the other processes.

    Values of "X" and "Y" depend, on disk seek times/latency, timeslice length, process priority, number of disk-blocked processes...etc. They'd have to be dynamically altered based on load.

    But that's what...circa 1989 tech? Good thing we've come so much farther than those primitive algorithms. :-(

    Those who do not learn from history are doomed to repeat it.

    It's amazing how, even now, many "modern" security concepts are being "discovered" that were already known back in the 60-70's.

    Maybe it's the tendancy in the industry to discard experienced programmers for cheaper-meat so generational knowledge that can be passed on over a generation (~20-30 years?) is lost as the "generation" time is shorted to 10-15 years. Dunno. Maybe modern programmers don't have time to read books on what's already been done because they know that whatever has been done in the past, they can do better (i.e. Egoism ignorance)?

    hmmm

Always draw your curves, then plot your reading.

Working...