What Are Typical Load Averages for Servers? 25
Jon Hill asks: "I'm curious to figure out how to guage the performance of my servers and know at what level of usage I should think about hardware upgrades. 90% of our servers run Linux and various services standard to Linux such as sendmail, samba, DNS, etc. One of our main servers (router/firewall/sendmail/spop) has been running with a load average of .5 to 1.5 regularly. It supports 200 users and is an SMP Intel machine with 2GB of RAM. I'm not sure if it needs software/kernel tweaking or hardware modifications and I can't seem to find any reference information. Suggestions?"
It really depends. . . (Score:3, Informative)
Average load is unique to a system. To figure out what that average is you need to monitor the server for a while.
I don't know about Linux, but I've done a lot of NT Server tuning and I think some of the general principles can be shared across platforms.
* Monitor CPU, Memory, Disk, and Network load over time (these are the four primary sources of bottlenecks in computer systems). Figure out what is regular for *your* systems. I take samples a few specific times a day every few days.
* If one metric is consistently high, at or near 100% utilization that's a good sign of a possible bottleneck. Take care of that bottleneck by increasing processor speed, adding more memory, adjusting the settings/algorythms of your software, etc.
* Make one change at a time, and then measure the results.
* Document your changes, so then if you actually slow the machine down you can go back to the original status
* When you remove a bottleneck, it is replaced by another. That's the name of the game.
* The best way to tell if you have a bottleneck is user input. Are they complaining that database lookups take to long? That web pages aren't delivered fast enough? Or are they quietly content (right, you wish!
Good Luck!
Sounds about right (Score:5, Informative)
The Solaris server where I work has 16 processors and the load average usually sits around 10-15. I'd be worried if my single-proc linux workstation had that high of an average, though... :-)
Re:Sounds about right (Score:1)
It all depends on what you find acceptable. (Score:2, Informative)
Load average is a measure of the number of things 'waiting' to run. Depending on your OS this may or may not include a number of intersting corner cases. In particular, this almost always includes things like disk i/o, and tty i/o. A user with a CPU bound process won't notice disk i/o issues, and vice versa.
So what is the range of acceptable? Well, for a single user workstation a load average of 1 (one thing waiting) probably means the user is waiting, and you may want more CPU or disk bandwidth. On the other hand, a highly multi-user machine (say a news server) may get optimal transfer rates out of the disk hardware by having a lot of things waiting so it can schedule reads and writes.
Look at all the resources on your machine, use tools like vmstat, iostat, netstat, etc. See why processes are waiting. Look at your user load and see if it's ok. For instance, with a 100Mbps ethernet, you could serve 10 users at 10Mbps each, or 100 at 1Mbps each. The later will have a higher load average, but if 1Mbps per user is fine with you, then there is no problem.
To give some real world examples. I've seen news and mail servers both run load averages well over 200, and sill deliver acceptable performance. I've also seen shell servers with load averages as small as 5 that are very sluggish (often because they are swapping).
Re:It all depends on what you find acceptable. (Score:2)
Is this true? Seems to me processes waiting on i/o would be blocked (in fact this is almost the definition of "blocked"), and therefore not in the run queue, and not affecting the load average.
Re:It all depends on what you find acceptable. (Score:1)
Actually, on a single user workstation a load average of 1 means either that the user is running seti@home or there's a nautilus thread stuck in a busy loop again.
Re:It all depends on what you find acceptable. (Score:1)
grains of salt (Score:1)
Re:grains of salt (Score:1)
Sounds too low (Score:3, Interesting)
A processor utilized 100% of the time will give you a load average of 1.0. If you've got two processors, you should aim for a load of 2.0 average.
So, good news! You don't have to do any tweaking for performance, unless you have specific issues with the speed of the server. You can probably add more to the server without affecting other processes (unless you've got a lot of I/O going on). You only gave CPU stats, so I am assuming that's what you're concerned about.
More useful measure (Score:2)
A more interesting measure is how well it copes under a heavy load, rather than an average one. For example, what are your peaks like? Do the users notice?
What's the load like when everyone arrives in the office in the morning and checks their mail? How much of an increase in load would it take to make it unuseable for everyone?
I think that kind of measure is more relevant. If your number of users increased by 10%, would everything fall over? (likely to happen eventually if the average load goes above 1.0 per CPU because it can never catch up with its workload)
- MugginsM
Rough formula... (Score:2)
Charles Dickens (Score:1)
"Two CPUs, load average 1.95, result: happiness. Two CPUs, load average 2.05, result: misery."
Peter
Sequent (Score:2)
A Perl 5-liner (Score:2)
# The closest thing I found to real CPU usage
my $pcpu;
for (`ps axo \%C`) {
next if m/\%CPU/;
$pcpu += $_;
}
print $pcpu;
Re:A Perl 5-liner (Score:2)
ps axo %C | grep -v %CPU | paste --delimiters="+" --serial - | bc
Or there's always:
top -n 1 -b | grep ^CPU
(Just straight "top" gives you much more)
Heh.. you call that a load average.. (Score:1)
Because the people who wrote the initial code did not make it scale very well.. when it went live with 80,000 people trying to use it, the poor boxes croaked.
The webserver hit a LA of 117 and the DB server got to 145. There are no decimal places in those numbers people
Re:Heh.. you call that a load average.. (Score:3, Informative)
The load average easily soared past 100 and up. It was becoming a nightmare. Without the new hardware ready, there wasn't much I could do.
But, I found that if I adjusted the time slices to 1/10th their normal level, the system had much better response, and the load average sank down into the 10's
My understanding of why this worked is because Solaris' process dispatcher worked a little differently, in that it also reserved 'unused' time for the process that just got off of the CPU, just in case it wants right back on. The idea is to preserve L2 cache.
In this case, when it was handling keystrokes back-and-forth, a small CPU requirement, ended up hogging a larger slice, and processing power was thrown away.
It was nice to see a change like that do wonders on the box.
Re:Heh.. you call that a load average.. (Score:1)
It wasnt fun - connections dropping all over the place, ssh slow as a dog....
didn't crash though - we just killed off all the heavy processes, gave it a little while and all back to normal (modified RH7 on SMB - 2x1Ghz PIII and 1GB RAM)
sarcheck system tuning 'expert system' (Score:1)
http://www.sarcheck.com/ [sarcheck.com]
"SarCheck is an inexpensive tool developed to help system administrators with UNIX performance tuning. It does this by analyzing the output of sar, ps, and other tools, and then reading more information from the kernel. It then identifies problem areas, and if necessary, recommends changes to the system's tunable parameters.
Upgrade your hardware iff: (Score:2, Insightful)
AND
2) You actually have nothing better to spend it on; unless you are very lucky, this one is not true.
AND
3) Software tweaking isn't doing any good.
OTOH, tweaking the kernel and such is always fun. Here are a few ideas:
1) Recompile the latest 'stable' kernel optimized for your machine. 2.4.2 -> 2.4.12 produces a huge increase in I/O performance on my machine, for example. You may find out something similiar.
2) Related thing: BIOS updates and tweaking can sometimes go a long way.
3) Upgrade the machine to the latest distro; a nice thing about Unix is things usually get faster, not slower.
4) Figure out what is using your CPU time. For example, given you're running SPOP, I suspect a lot of that time is used for SSL. So recompile OpenSSL with better optimizations (the normal OpenSSL RPMs are always underpowered; asm is disabled, no -march flags, etc), and you should see a magical increase in performance.
5) Assuming this makes your system faster, celebrate by spending some of the money you would have used to upgrade on beer.
General recommendation... (Score:3, Insightful)
I've found it a reasonably good guide to when there's an issue on Solaris boxes; I think linux uses similar numbers to calculate run queue averages, but other OS's (eg, IRIX) use different formulas to calcualte it so you might need to tweak this recommendation.
Depends on what you're after (Score:2)
But a high load average is only a potential symptom of a problem, not a problem in itself. It might mean (as it has done with our machines in the past) that so many processes are running that memory happens to be low as well, and reliability goes down as processes don't cope with being killed or running out of memory. But if that's not happening, the only reason to worry is if the people using the machine complain: slow mail deliveries, POP3 pickups, DNS resolutions or whatever other 'work' the server is up to. If you want to roll your own benchmarks to test these things, you can then decide on how slow is too slow, and upgrade accordingly.