Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Linux Software

What Are Typical Load Averages for Servers? 25

Jon Hill asks: "I'm curious to figure out how to guage the performance of my servers and know at what level of usage I should think about hardware upgrades. 90% of our servers run Linux and various services standard to Linux such as sendmail, samba, DNS, etc. One of our main servers (router/firewall/sendmail/spop) has been running with a load average of .5 to 1.5 regularly. It supports 200 users and is an SMP Intel machine with 2GB of RAM. I'm not sure if it needs software/kernel tweaking or hardware modifications and I can't seem to find any reference information. Suggestions?"
This discussion has been archived. No new comments can be posted.

What Are Typical Load Averages for Servers?

Comments Filter:
  • by foo fighter ( 151863 ) on Wednesday November 14, 2001 @09:01PM (#2566888) Homepage
    What you are asking about is 'performance tuning'. Do a search at Google on that term and you will find plenty of information online. http://linuxperf.nl.linux.org/ might be a good place to start.

    Average load is unique to a system. To figure out what that average is you need to monitor the server for a while.

    I don't know about Linux, but I've done a lot of NT Server tuning and I think some of the general principles can be shared across platforms.

    * Monitor CPU, Memory, Disk, and Network load over time (these are the four primary sources of bottlenecks in computer systems). Figure out what is regular for *your* systems. I take samples a few specific times a day every few days.

    * If one metric is consistently high, at or near 100% utilization that's a good sign of a possible bottleneck. Take care of that bottleneck by increasing processor speed, adding more memory, adjusting the settings/algorythms of your software, etc.

    * Make one change at a time, and then measure the results.

    * Document your changes, so then if you actually slow the machine down you can go back to the original status

    * When you remove a bottleneck, it is replaced by another. That's the name of the game.

    * The best way to tell if you have a bottleneck is user input. Are they complaining that database lookups take to long? That web pages aren't delivered fast enough? Or are they quietly content (right, you wish! :-)?

    Good Luck!
  • Sounds about right (Score:5, Informative)

    by KnightStalker ( 1929 ) <map_sort_map@yahoo.com> on Wednesday November 14, 2001 @09:02PM (#2566893) Homepage
    The load average represents the number of processes waiting in the run queue over the last x amount of time. I think the three numbers reported in "uptime" are the last minute, five minutes, and ten minutes, but I could be wrong. It sounds like yours is fairly low, because you have more than one processor (a load average of 2 or more would be 100% usage for a dual proc machine, at least on Solaris).

    The Solaris server where I work has 16 processors and the load average usually sits around 10-15. I'd be worried if my single-proc linux workstation had that high of an average, though... :-)

    • Side note, according to the Sun Performance class (or was it the Cockcroft book?), it includes processes currently running on a processor also. Hence, you only have processes waiting on a processor if you load is greater than your number of processors. Anyway, the reccomendation I remember is that you shouldn't even think about upgrading until the load is sustained at 2 times the number of processors or so.
  • Load average is a measure of the number of things 'waiting' to run. Depending on your OS this may or may not include a number of intersting corner cases. In particular, this almost always includes things like disk i/o, and tty i/o. A user with a CPU bound process won't notice disk i/o issues, and vice versa.

    So what is the range of acceptable? Well, for a single user workstation a load average of 1 (one thing waiting) probably means the user is waiting, and you may want more CPU or disk bandwidth. On the other hand, a highly multi-user machine (say a news server) may get optimal transfer rates out of the disk hardware by having a lot of things waiting so it can schedule reads and writes.

    Look at all the resources on your machine, use tools like vmstat, iostat, netstat, etc. See why processes are waiting. Look at your user load and see if it's ok. For instance, with a 100Mbps ethernet, you could serve 10 users at 10Mbps each, or 100 at 1Mbps each. The later will have a higher load average, but if 1Mbps per user is fine with you, then there is no problem.

    To give some real world examples. I've seen news and mail servers both run load averages well over 200, and sill deliver acceptable performance. I've also seen shell servers with load averages as small as 5 that are very sluggish (often because they are swapping).

  • Remember that load average is not as clear an indicator of overall performance as it seems. Load average is based upon the average number of processes that are waiting for kernel execution time. So, if you are IO-bound, a very common problem even in the server market, then load average won't accurately monitor performance. If you're old school, use sar (System V) or vmstat (BSD) for performance monitoring. Of course, more modern tools exist.
    • This is on the money. The most important thing is if your server is running its software well and your users are seeing good response times. I've had servers that did fine with load averages over 200 while I've seen other servers that tanked while the load average was still below the number of CPU's. When you are not seeing good performance than statistics like load average, page rescans, network collisions, etc. are things you need to look at to determine where the bottle-neck is. If it ain't broke don't fix it! Most "real" servers are i/o bound rather than cpu or memory bound. The i/o can be disk or network but i/o is typically the factor limiting throughput not cpu. Application servers running bloatware sometimes are cpu limited though.
  • Sounds too low (Score:3, Interesting)

    by PD ( 9577 ) <slashdotlinux@pdrap.org> on Wednesday November 14, 2001 @11:20PM (#2567181) Homepage Journal
    If you've got an SMP machine, and your averages are .5 to 1.5, then you've either got too big of a machine for the job, or you should put more stuff on it to utilize it better.

    A processor utilized 100% of the time will give you a load average of 1.0. If you've got two processors, you should aim for a load of 2.0 average.

    So, good news! You don't have to do any tweaking for performance, unless you have specific issues with the speed of the server. You can probably add more to the server without affecting other processes (unless you've got a lot of I/O going on). You only gave CPU stats, so I am assuming that's what you're concerned about.
  • I think the load average isn't a hugelt useful measure for whether your setup is fast "enough". In your case, a load of 1.0 on an SMP machine suggests that it could handle about twice as much (YMMV) work before it started to get slower for the users. Which is a handy thing to know.

    A more interesting measure is how well it copes under a heavy load, rather than an average one. For example, what are your peaks like? Do the users notice?
    What's the load like when everyone arrives in the office in the morning and checks their mail? How much of an increase in load would it take to make it unuseable for everyone?

    I think that kind of measure is more relevant. If your number of users increased by 10%, would everything fall over? (likely to happen eventually if the average load goes above 1.0 per CPU because it can never catch up with its workload)

    - MugginsM
  • When dealing with processes waking up independently (which isn't *completely* wrong in the case of a web server) the load will tend towards [number of processors] * ([cpu utilization]/(1-[cpu utilization])). Your load of 0.5-1.5 on a dual processor machine equates to a cpu utilization of 20-43%.
  • Charles Dickens said it best:

    "Two CPUs, load average 1.95, result: happiness. Two CPUs, load average 2.05, result: misery."

    Peter
  • This is only somewhat related, but back in 1990 I worked on a Sequent (now IBM NUMA-Q [ibm.com]) that had 10 80386 processors. We regularly ran 200+ users with a load average under 1. We had planned for 10 users per CPU, but it held up well at nearly 30 per CPU.
  • #!/usr/bin/perl
    # The closest thing I found to real CPU usage

    my $pcpu;
    for (`ps axo \%C`) {
    next if m/\%CPU/;
    $pcpu += $_;
    }

    print $pcpu;
    • Or if you prefer a command-line solution:

      ps axo %C | grep -v %CPU | paste --delimiters="+" --serial - | bc

      Or there's always:

      top -n 1 -b | grep ^CPU

      (Just straight "top" gives you much more)
  • I used to write code for Undernet [undernet.org]. We deployed a new services bot earlier this year. The initial trials of it were none too sucessful. We had a database server, and a physically separtate web front end running PHP.

    Because the people who wrote the initial code did not make it scale very well.. when it went live with 80,000 people trying to use it, the poor boxes croaked.

    The webserver hit a LA of 117 and the DB server got to 145. There are no decimal places in those numbers people ;)
    • Along those lines, at work, I had a 5-way box (5x250mhz, 5gb RAM) that supported 1,100+ simultaneous users telnetting into it, running an application. A typical user had four processes running. The application was interactive and the user would type a few things, hit against the back-end database [on another box], and go off and do more stuff.

      The load average easily soared past 100 and up. It was becoming a nightmare. Without the new hardware ready, there wasn't much I could do.

      But, I found that if I adjusted the time slices to 1/10th their normal level, the system had much better response, and the load average sank down into the 10's

      My understanding of why this worked is because Solaris' process dispatcher worked a little differently, in that it also reserved 'unused' time for the process that just got off of the CPU, just in case it wants right back on. The idea is to preserve L2 cache.

      In this case, when it was handling keystrokes back-and-forth, a small CPU requirement, ended up hogging a larger slice, and processing power was thrown away.

      It was nice to see a change like that do wonders on the box.
    • The highest I've ever seen on a production box was 245.

      It wasnt fun - connections dropping all over the place, ssh slow as a dog....

      didn't crash though - we just killed off all the heavy processes, gave it a little while and all back to normal (modified RH7 on SMB - 2x1Ghz PIII and 1GB RAM)
  • We used the sarcheck product this year during website app performance testing. Although no substitute for rolling up one's sleeves and getting a PhD in man pages and tuning textbooks, it did offer a quick kill solution. Note that I just used the product, I don't sell it.

    http://www.sarcheck.com/ [sarcheck.com]

    "SarCheck is an inexpensive tool developed to help system administrators with UNIX performance tuning. It does this by analyzing the output of sar, ps, and other tools, and then reading more information from the kernel. It then identifies problem areas, and if necessary, recommends changes to the system's tunable parameters.

  • 1) Users are complaining because it's too slow
    AND
    2) You actually have nothing better to spend it on; unless you are very lucky, this one is not true.
    AND
    3) Software tweaking isn't doing any good.

    OTOH, tweaking the kernel and such is always fun. Here are a few ideas:

    1) Recompile the latest 'stable' kernel optimized for your machine. 2.4.2 -> 2.4.12 produces a huge increase in I/O performance on my machine, for example. You may find out something similiar.

    2) Related thing: BIOS updates and tweaking can sometimes go a long way.

    3) Upgrade the machine to the latest distro; a nice thing about Unix is things usually get faster, not slower.

    4) Figure out what is using your CPU time. For example, given you're running SPOP, I suspect a lot of that time is used for SSL. So recompile OpenSSL with better optimizations (the normal OpenSSL RPMs are always underpowered; asm is disabled, no -march flags, etc), and you should see a magical increase in performance.

    5) Assuming this makes your system faster, celebrate by spending some of the money you would have used to upgrade on beer.
  • by larien ( 5608 ) on Thursday November 15, 2001 @05:37PM (#2571080) Homepage Journal
    As a general recommendation I heard once, your load average shouldn't get more that 2xnumber of CPUs. i.e. on a single CPU box, it shouldn't get higher than 2, for a 64-CPU high-powered server, it shouldn't get above 128.

    I've found it a reasonably good guide to when there's an issue on Solaris boxes; I think linux uses similar numbers to calculate run queue averages, but other OS's (eg, IRIX) use different formulas to calcualte it so you might need to tweak this recommendation.

  • I'm part-responsible for a bunch of fairly basic Redhat servers (single CPU, 256MB etc.) that spend their time crawling the web, keeping their 100Mb nework connections saturated, at around a load of 17 quite reliably, and have even worked at around 50, though accepting ssh connections becomes impossible after a certain level :-) But they still get all their work done eventually; I'm not sure what the relative efficiency of each server was relative to their load average, whether they get any more work done, but if all they're doing is using a lot of CPU, you can reliably push the load average very high without ill effect.

    But a high load average is only a potential symptom of a problem, not a problem in itself. It might mean (as it has done with our machines in the past) that so many processes are running that memory happens to be low as well, and reliability goes down as processes don't cope with being killed or running out of memory. But if that's not happening, the only reason to worry is if the people using the machine complain: slow mail deliveries, POP3 pickups, DNS resolutions or whatever other 'work' the server is up to. If you want to roll your own benchmarks to test these things, you can then decide on how slow is too slow, and upgrade accordingly.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...