

Swap Performance in Linux 62
GizmoDuck writes "I'm working in a computational chemistry lab, and we find ourselves using memory and CPU hogs like Amber and Gaussian. The CPU hogging isn't a problem, thanks to Condor, but when submitting one of the jobs that request (and pretty much require) all the physical RAM in the machines, Linux promptly starts swapping so hard that the mouse pointer in X stops moving, NFS and NIS halt, and things don't get back to normal for five minutes. I've tried toying a bit with the settings in /proc/sys/vm/kswapd to no avail. I've done some poking around on the 'net looking for answers. Faster disks and swap partitions at the beginning of the drive aren't really an option at this point. I haven't found a good solution yet. I was wondering if the /. community has any input on how to keep the system from locking during periods of necessarily high swap activity?"
Without Swap (Score:1)
Re:Without Swap (Score:2)
Re:New memory manager (Score:1)
The answer can be summed up in a math equation (Score:2, Informative)
I still have no idea why Linus used 2.4 as a development tree. Go back to 2.2.x, no swapping problems going on there.
By the way, does anyone know the command to flush the swap partition?
Re:The answer can be summed up in a math equation (Score:1)
Hmm... couldn't you just
swapoff
??? Make sure you select the right partition, obviously, but wouldn't that work? Any reason why not?
Re:The answer can be summed up in a math equation (Score:1)
preempt ? (Score:4, Informative)
Maybe try out the preemptible kernel patch?
My personal experience is that it has helped my workstation's interactive performance noticeably for big ass c++ compiles and periods of lots of disk activity (big apt-get dist-upgrades). Thankfully, I'm no longer doing the big ass c++ compiles, so it's not as big of an issue as it used to be :)
Try preemtable kernel patch... (Score:5, Interesting)
You might also consider a crazy idea of having swap file on NFS -- you'll get (if your network is decent) almost the same bandwidth as you get when accessing (older) disk, but much higher latency (this will put your background process in disadvantage compared to your interactive processes).
Hope this helps.
Paul B.
Re:Try preemtable kernel patch... (Score:1)
A neat idea, but wouldn't that just migrate the problem to the NFS host? I'm too lazy to try it myself.
Re:Try preemtable kernel patch... (Score:2, Interesting)
Sure it would, but:
interactive performace on NFS server might not be that important
it might have faster disks
and, finally, the swap hog program will slow down due to network latency, creating less load on NFS server than it would on the workstation.
Paul B.
Network swap? Nooo!!! (Score:1)
That sounds nice until you realize it's your X server and xterms that'll get swapped out as soon as you leave your workstation for a moment. When you have to wait for *those* to come in over the network, you'll be crying for local swap once again.
Network swap is really only a useful option for diskless workstations.
--JoePushing the limits of RAM (Score:4, Interesting)
If your program(s) push Linux to the point where it actually runs out of available RAM faster than it can free it up, then "all hell breaks loose". It has to swap something out, and just about every program is eligible to be swapped out. That includes GPM (if you are on a virtual console) or X (if you are in X Windows). You need to account for all of these things to determine your RAM needs. Add up the memory usage of all your active programs, plus the buffer demands they have doing disk I/O, plus the kernel, and you need that much RAM. If the program is doing a LOT if disk/file writes, you can expect the buffer demands to be the majority of this, too (because the kernel believes what you just wrote you might soon want to read back, so it tries to keep lots of it in RAM even if that means swapping out GPM and X).
RAM? (Score:1)
What are the system specs?
FreeBSD (Score:3, Interesting)
Re:FreeBSD (Score:2, Insightful)
FreeBSD on a dual processor box will match a dual linux box any day. I didn't say beat but they can go back and forth depending on exact applications. If you go more cpus then performance will start dropping off in a hurry. It shouldn't do as well on paper but real world applications show that it performs very well with 2 cpus. It's kind of like micro-kernel *should* be better than monolithic but in the real world it isn't.
I'd go with a dual FreeBSD box any day especially if it is going to be under high loads. I have more linux boxes than anything else right now but their performance under load has been an issue. If you would rather stick with linux then look at some of the alternative VMs out there. I would stick with linux if you have more than 2 cpus unless you really want to go with a commercial Unix (Solaris x86 maybe??).
Re:FreeBSD (Score:4, Interesting)
I'd have to agree. The author should look into using FreeBSD. A GIS project I'm currently working on allocates 3GB of RAM at startup. Until we get the rest of the funding for our SunFire solution [sun.com], we're using what we have available, which is (was, actually: we've replaced the OS with FreeBSD) a P4 Linux box with 2GB of RAM, a 9GB SCSI drive for swap partition and a 36GB SCSI drive for everything else.
I'm not a Linux expert, but the techs in the department are. After a few weeks of their tinkering, it did pretty much the same thing as you're experiencing. I have a small development system at home (P3, 1GB RAM, 4GB SCSI swap, 40GB IDE for all else) running FreeBSD. Installed the software, and it runs like a charm. X works beautifully, Apache still serves up pages (of course, it doesn't get much traffic at home) and the program never chokes the system. Granted, with only a gig of real memory, it spends a fair amount of time accessing the disk (about 30 seconds every 2 minutes), and it steals almost all the cycles from dnetc [distributed.net]!
Preempt + Ingo Scheduler (Score:4, Interesting)
They will significant increase high load user performance, keeps the system from running away with itself. If your feeling really, adventuresome you could also throw in Rik's Rmap VM...I have done very little testing with it, but I hear alot of reports that it helps.
there are all available in the authors respective directories on Kernel.org [kernel.org] riel,rml,mingo
Re:Preempt + Ingo Scheduler (Score:2)
To build it, get the linux-2.4.17.tar.gz [kernel.org] kernel, patch [kernel.org] it to linux-2.4.18-pre9, then patch again with patch-2.4.18-pre9-mjc2. Then build and use the kernel. Check recent (ie. 2002 ) kernel archives to read discussion of this and other related patches, if desired.
You're out of luck (Score:5, Interesting)
Unfortunately, you're out of luck. The current linux VM (in later 2.4 series) is fine for low to medium load systems but falls apart on high load systems. The previous VM (early 2.4 series) is a good design but isn't really ready for production.
I would suggest buying more RAM (it's cheap) if you aren't already maxed at 4 gigs (x86). Alternatively switch to FreeBSD which has a very stable efficient VM. Any source should recompile without too much trouble and it can run linux binaries at almost full speed!
Re:You're out of luck (Score:1)
Re:You're out of luck (Score:2)
The VM in the early 2.4 kernels would grossly lock up when it was out of memory. I was told this was due to the fact that the design assumed you had at least as much swap space as RAM. It could not handle the case of (memory need > swap) even though (memory need < swap + ram). I have several systems which have lots of ram and no swap at all, and they would die quickly. And it wasn't because I was overusing memory with the processes. This would happen even if the ram got used up when writing data to a file larger than ram space. The later 2.4 VM fixed that. Hopefully when Rik's VM is cleaned up, it should solve the problem with lack of (or small) swap.
A couple of suggestions (Score:2)
In my testing, these two patches have been a big help, especially on my P166 system with 48MB RAM.
Also, you say "faster drives" and repartitioning are not feasible ATM, but how about multiple small drives? As shown in this howto [linuxdoc.org], the linux kernel has support for striping data to swap disks, just by specifying multiple swap entries in fstab.
Then again, if you're not on SCSI, trying to stripe to the swap drives won't be much help anyway, as RAID over IDE for _speed_ usually is just crap.
That last suggestion may not be for you, but definitely try the two patches. It should also be noted that preempt is a compile-time option, and there is also a compile-time option to control the low-latency patch through
Re:A couple of suggestions (Score:1)
('Course swap is how many millions of times slower than RAM...?
And if you want that performance with redundancy then do Raid-0/1 with separate controllers for each 0-set (you want physical redundancy anyway) and your performance hit from mirroring actually won't be too bad.
My $.031475
can you "nice" the applications? (Score:2)
I can't recall the command line option off the top of my head but I know using Gtop, you right click the app, and pick renice, then set it to 1 instead of 0.
Re:can you "nice" the applications? (Score:1)
Re:can you "nice" the applications? (Score:1)
Re:can you "nice" the applications? (Score:2)
BTW, the command is "nice" or "renice" if its already running. Pretty tough to figure out.
Recent kernel? (Score:1)
Linux 2.4.x VM (Score:3, Insightful)
Did you miss all the 2.4 Linux VM Stories?
I suggest build/installing the latest kernel with the aa VM (the default VM, since 2.4.10). If you still have VM (Swap) problems then go get the latest rmap VM patch and try that.
The kernel VM (Virtual Machine) is what manages memory and sawp, btw.
And if u did miss all the VM stories, a summery:
at the start of 2.4 a new fancy mv was put in to action, using something known as reverse mapping. this was very clever but it wasn't quite ready and there were teathing troubles then suddenly (2.4.10) Linus switched VM to one similar to that of 2.3 (with some updates and a few features from the previous 2.4 VM) This started a big fight, which caused concerns (such that it may split the linux comunity)
which is better i dont know some swer by one other swer the other. but unless ur using RH 2.4.9 kernel i would not recommend a pre 2.4.10 kernel.
however you may need to experiment which is best the VM now in 2.4 (to stay) or rmap, u should try both and see
steps
Install 2.4.[17,18,19] [kernel.org]
try it
if it fails u try the rmap patch [surriel.com]
Shut down X? (Score:5, Interesting)
My system never seems to give swap BACK (Score:2, Interesting)
What I'd like to see is something along the lines of some kind of LRU which gently starts swapping data back into memory from swap when memory becomes free. There's nothing like having VMWare sitting in swap since you stopped using it an hour ago to do some other work and then jumping back and having to wait the 5-10 seconds of heavy disk activity to resume work there.
As for those saying "don't use swap at all" -- that's crazy talk. I'd rather have an app or two go to swap instead of being outright killed by the VMM when it needs an extra meg or so. If I'm not mistaken Linux tends to pick the big memory eaters to dump to swap over the little guys so if you start a compile... there goes VMWare... or your IM client... or Konqueror... lots of fun. :-)
Re:My system never seems to give swap BACK (Score:1)
You may disagree with the sentiment, but you get benefits fro it all the time. All those gettys that you're not using? Init? Portmap? Pump? devfsd? lpd? atd? cron? xinit? Those are swapped out, and they won't be wasting your RAM until you want to use them (and then, it's not "wasting").
But yes, for extremely large programs, it'd be swell if Linux (or anything) could predict you'll want something before you actually want it.
I hate to say it... (Score:1, Redundant)
Computational Chemistry (Score:3, Informative)
Those people may be able to give you some sensible suggestions, especially with respect to those particular peices of software.
I believe that you can restrict the amount of memory that Gaussian uses via its keywords. When it requires more, it will handle the dumping of data to disk itself. Read the manual - I haven't used gaussian since g94 was the current version so can't remember..
How big is your AMBER simulation? I think I would run a smaller system... or even better... buy some more RAM given that it is dirt cheap nowadays.
AMBER's memory use is a bit heavy - you may have better luck with another MD package. Maybe NAMD? (Although I'd still vote for the "buy more RAM" option)
Not mentioned yet -- go lean... (Score:4, Insightful)
With that as a given, if your app needs all available memory, run top and lsmod to see what's using your memory and remove everything you don't need (usually by deleting the links to those processes in the /etc/???/rc5.d directory).
If you can't remove it, scale it down. For example /etc/inittab lists off the different virtual terminals that appear when you press ctrl-alt and a function key. If you never use this feature, try reducing this down to 1 or 2 terminals. Leave some behind just in case you need them later. To do this, just comment the higher numbered lines that look like this;
6:2345:respawn:/sbin/mingetty tty6
(NOTE: Removing these lines might not make any difference -- it all depends on the distribution.)
As for X (assuming you need it and are using XFree), try removing any Load lines in the modules section that you don't need and scaling down the display size, background images, and color depth. Another big area of savings is changing the window manager. FVWM usually is installed, and while it is ugly it is also fairly light weight when compaired to KDE, Gnome, and other popular full-featured WMS.
While these steps alone won't eliminate the speed problems -- the other comments might solve that -- the time you spend waiting might be cut way down.
Re:4K here, 2K there, it's in the 3rd decimal plac (Score:2)
Any change in available memory can have a drastic effect. The sum total of the changes should add up to a minimum of 10M on an untuned system (One example: Bonobo on Gnome uses ~3.5MB by itself, while a few Gnome terms with a large history buffer chew up an additonal 10MB -- not all of it shared. Just switching from a heavy weight WM to a light weight one and smaller helper apps would recover the bulk of this space. Other changes would only add to the savings).
That minimum of 10MB might be just enough to cut disk swapping down -- by how much it really depends on the application. If it's a single block of data, and no calculations are being done, no speed improvement will be noticed. If it's an in-memory array, the savings could be substantial.
Without giving it a try, or knowing the application's demands, nobody can say for certian.
Re:4K here, 2K there, it's in the 3rd decimal plac (Score:2)
A new post points out that the systems had 256MB, so recovery of 10MB should make a substantial difference.
Re:Not mentioned yet -- go lean... (Score:1)
Of course, it's a work-around, this dosen't actually *solve* your problem, but you'll have to talk to Linus and the rest of the kernel hackers to get a real solution.
i'm a don't use swap nut (Score:1, Offtopic)
Just get more ram (Score:2)
A Better Way (Score:1, Interesting)
/etc/security/limits.conf
I use this method. I specify default values for nice levels, amount of CPU time, amount of memory, etc.
This is a much better way. I will set up accounts with these restrictions. That way processes are running at a nice level of e.g. 5. X will be running at level 0 by default. This insures that you can always get back into X even if the app(daemon) goes nutty with e.g. a memory leak.
No messing with command lines etc. The defaults have already been set.
unmask interrupts (Score:3, Interesting)
You could try hdparm -u 1 which unmasks interrupts when the disk interrupt service routine is active. This often allows your mouse to continue moving even if the disk is busy dealing with swap. It's not perfect but it helps a lot. As others have suggested, also try the preemptible kernel patch but keep backups!
Some details for the curious (Score:2, Interesting)
happened.
2) We're already nice-ing things up the yin yang and using the 2.4.18 kernel with pre-empt patch with no noticeable results.
3) The machines must stay useable as they are also analysis and server machines in addition to computational boxes.
4) Machines are dual P3 1400s. Unfortunately, disks are EIDE and RAM is 256MB in the process of being upped to a gig. However, this doesn't change the fact that we'll be running some calculations that will use all of that.
4) We're not so anxious to buy 4GB of RAM for each machine until we're sure what kind of Beowulf cluster we're constructing and hence how much of our money goes to it.
FreeBSD (Score:2, Interesting)
Have you tried FreeBSD? Apart from being a better OS all round, the 4.x series has a brand new revamped VM subsystem that handles high memory loads very efficiently. I never have a problem with swapping on any of my machines (which range from 32mb, 64mb, to 512mb ram machines).
This isn't a troll. Sometimes a certain OS isn't the best solution for a job, and a different OS should be used. I use Linux for GUI/X type things, FreeBSD for heavily loaded servers (since it handles much better), and even Windows 2000/XP for other things. If those programs you use are linux binaries, FreeBSD can easily run them. If you have source, all the better. Recompile with all the specific optimizations for your hardware. (-O3, -mcpu=pentiumpro, -march=pentiumpro, etc)
D.
Hardware performance? (Score:1)
I know! (Score:1)
Performance problem solved
Run Windows! (Score:1)
Maximum swappage (Score:2)
the job you are doing.
Of course, you can try read:
/usr/src/linux[name]/Documentation/sysctl/
for some tunable
Since you are using ide disks, 'man hdparm' is your friend.
Check your kernel config for dma support of your mobo chipset.
Daniel Robbins (from gentoo linux) has written an interesting
article "Maximum swappage" http://www-106.ibm.com/developerworks/library/swa
Linux allow you to parallelize swap, just like a RAID 0 stripe
/etc/fstab:
/dev/hda2 none swap sw,pri=1 0 0
/dev/hdb2 none swap sw,pri=3 0 0
/dev/hdc2 none swap sw,pri=3 0 0
Eg.: spread your swapfile on two disks, with equal priority.
That way, you should in theory, double RW access speed for the
swap. Also, some gains could be gained, if the swap partitions
were moved from disks, that the OS and apps writes to.
But read the article.
a little off topic...IRIX and memory tuning (Score:1)
thrashing (Score:1)