Forgot your password?
typodupeerror
Software Operating Systems Unix

Server Monitoring Solutions? 58

Posted by Cliff
from the keeping-an-eye-on-things dept.
bwhaley asks: "The University I work for has asked me to research software solutions for server monitoring. More specifically, a piece of software that will monitor server variables such as load, swap usage, POP/IMAP processes, total processes, and all the other interesting data about a server's health. Watching these variables can give administrators advance warning about potential problems with the server. We are currently using an in-house solution written in Perl but its age is showing. I have found plenty of proprietary solutions such as HP OpenView and Sun Management Center, but these cost thousands of dollars. What solutions do Slashdot readers use? Are there any powerful open source solutions that I'm missing? Is anyone else running homegrown software that they are happy with? We are running an entirely Solaris environment but I am interested in any UNIX solution."
This discussion has been archived. No new comments can be posted.

Server Monitoring Solutions?

Comments Filter:
  • CompSci students? (Score:3, Insightful)

    by agent dero (680753) on Wednesday October 15, 2003 @11:28PM (#7226343) Homepage
    I would suggest talking to whoever teaches computer science and software. Get the kids doing this for an education to rewrite your perl scripts that do the same job.

    That's something you can pass off as helping everybody, saving y'all money and teaching compSci kids how to work with the computers and OSes
    • Oh man I was reading this thinking 'yes! Yes! YES! I have an answer!' right up until I read the last few lines about actual platforms.

      I was headed towards setting up perfmon as a service and having one machine lookup the values from all the other machines and display them in either graph or save it as data - but this is obviously not the answer you were looking for.

      Hey, I tried. I am only just now coming up to speed on Linux so it will be a while before I am useful in that arena. Slashdot motto : if yo
  • Check out bb4.com [bb4.com].
  • Nagios [nagios.org] might be what you're looking for. Cheers.
  • Have you heard of Nagios [nagios.org]?
    • Re:Nagios (Score:4, Insightful)

      by Sentry21 (8183) on Wednesday October 15, 2003 @11:44PM (#7226431) Journal
      I second Nagios. I set it up as a technology test I was doing a while back to monitor our internal network and some remote servers (arbitrary web servers on the internet) for a lark - got it telling uptime, system load, swap, memory usage, processors, network load and the like on our Linux and Win2K machines (including various network interfaces - when the wired interface on the laptop was disconnected, it paged me - useless for our situation, but good for multihomed machines).

      It can monitor all kinds of machines, services, ports, networks, pings, traceroutes, anything. Beautiful setup, and highly recommended.

      --Dan
      • I'll third nagios (www.nagios.org) as I've used it and its previous incarnation (netsaint) in production environments. It has a very extensible setup. It has a very active development community as well. You could probably set up a limited test of its functionality on a spare box in a weekend.
    • Yup, Nagios is pretty much what you're after. I had it up and monitoring all my servers in about a day.

      The only advice I can give is take the time and read the docs. They are very good and understanding what's going on will save you loads of time down the road when you want to add stuff.

      later,
      ajay
  • I haven't used it but it seems like Nagios [nagios.org] is what you want. It's GPL and is supposedly very powerful.
    • I second nagios. We use it at work to monitor around 700 hosts and all of their services. Just don't have one machine monitor more then a few hundred hosts, it tends to get a bit behind at time.
  • Big Brother [bb4.com]

    There's a vibrant community with lots of scripts [deadcat.net] to extend functionality.

    It's free as in beer (but not freedom) for almost all uses, and is open source. You only have to pay if you use it to generate money.
    • by TBone (5692)
      BS is the rewrite of BB4, which uses actual shell scripts, to make the modules use Perl and be much more "correctly" modular.
  • by keesh (202812)
    Big monitor, gkrellm over remote X and someone to sit there and watch :)
    • i am just looking at my 10 remote gkrellm now :-)
      its a big bunch of information :-)

      it is fun to find some degree of paterns ;-)
    • Better yet, run a local copy of gkrellm and connect to the remote gkrellmd. gkrellm is nice for quick glances but doesn't keep any history of what it monitors, which I imagine is part of what the poster is looking for.

      It's nice to be able to analyze the historical data to make predictions and such.
  • top [gnu.org] is terrific
    • by ader (1402)
      OK, the version of top to which you're referring is actually here [sourceforge.net], and it only works on Linux anyway.

      top for Solaris and other Unices is here [groupsys.com]. It's great for monitoring a single system in real time, but it's not what the poster is seeking.

      Ade_
      /
  • pretty pictures are more fun to look at! Check out cacti for all of your process/bandwidth/load/usage graphing needs. It's available at raxnet.net [raxnet.net]
  • Nagios [nagios.org] is a great server monitoring system and seems to have what you need.

    Its meant for Linux but works under most *NIX variants
  • Big Sister (Score:2, Informative)

    by Quixotic137 (26461)
    If you don't want to pay for Big Brother, take a look at Big Sister [graeff.com]. It does at least much of the same thing, but free (as in beer and speech).
    • I quite like Big Sister as well. At my last job I was using it to monitor around 50 servers, shown split into their four different functional groups.

      Service failures generated emails, and we also configured it to sned an SMS to us out of office hours. The servers were mostly windows NT boxes, so when a BSOD took out a web or FTP server, we were alerted within a few minutes. The default was about 20 minutes, I had to tweak that setting. That was easy because it's all written in perl (with the exception
      • according to this Page [graeff.com], the author of big sister is not willing to maintain the windows
        port anymore - without sponsoring (-which IMHO is a good way to go)..
    • Might want to verify, but BB probably wouldn't cost for a Uni. My understanding is that even a commercial entity can use it for free if the servers being monitored are non-commerce; i.e. your QA and development servers.

    • Big Sister is pretty powerful and quite extensible too. Be aware that it takes a non-trivial amount of effort to set up, as I found out. It works on all major O/S flavours though, which is a plus. It also interfaces with other packages, such as OpenView, should you ever need it to.

      We are doing a similar evaluation where I work. I think we'll end up with OpenView if the costs work out OK. There are other good commercial solutions on the market, such as Foglight, Storage Profiler, Sun Management Console, Tiv
  • That's easy, use nagios [nagios.org]. It what I use and it's great. For the holes it doesn't fill, go try out mrtg. :-)
  • First, try nagios, which is open source from www.nagios.org. It takes a small commitment to setup, but works *very* well.

    Second, you might try Sun netconnect since you are running all Solaris. I haven't used it myself, but some people at my nameless company have and think well of it.

  • how about nagios [nagios.org]?
  • by fdragon (138768) <fdragon@fdrago[ ]rg ['n.o' in gap]> on Thursday October 16, 2003 @12:28AM (#7226673) Homepage
    I don't know why everyone forgets the default solution. SNMP comes with almost all Unix systems and Microsoft Windows.

    If your Unix system doesn't come with one Net-SNMP [net-snmp.org] will install on many of them.

    The SNMP daemon by default understands how to monitor Load Avg, Memory, Processes, and so forth. It may not be able to tell you details of the process, such as what user is logged into the POP3 daemon, but it will tell you that you have 500 of them running, and alert you (via SNMP Traps) of that fact.

    ALl you need to do once you have checked the documentation for your SNMP agent and then configured it, is to setup a single (ok, maybe 2 or 3) machine to send your traps to so you can kick of alerts. With some simple scripting in $FAVORITE_SCRIPTING_LANGUAGE you can email, page, text message, update web page, or $OTHER.

    Cricket [sourceforge.net] or MRTG [ee.ethz.ch] are nice utilities that will poll the servers in question (by default every 5 minutes) and produce graphs. MRTG was designed to handle network equipment and graph the bandwidth utilization, but with a change to the SNMP string, will graph anything. Cricket is the same concept but does things a little differently by using a tree configuration system for property inheritance and does graph generation on the fly instead of the at poll time method MRTG uses.

    And last but not least, Transmeta produced a very good perl script monitoring package known simply as Mon [kernel.org]. This package will do active polling of the servers including issuing a transaction to the service you are monitoring. Due to the way this software monitors, you can actually see if the remote machine is alive by actually utilizing the service to monitor instead of just the "I can ping it, it must be up" mentality some people have.

    Best part about all the above mentioned software is that they are all applications with an OSI Approved OpenSource license. This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.

    And you may wonder about the impact of system performance due to the monitoring by SNMP, MRTG/Cricket, and Mon. The short answer is that I couldn't detect a noticable increase. Other utilities such as Argent (Commercial Pay For Software) would impact a HP-UX V Class 8 CPU with 8GB RAM machine from 0% on all 8 CPUs to about 20% on ALL 8 CPUs while it telneted to the machine, created about 150KB of test scripts, and then ran them.

  • JFFNMS (Score:3, Informative)

    by szysz (214137) on Thursday October 16, 2003 @12:36AM (#7226720) Homepage
    You could use my project !

    JFFNMS - Just for Fun Network Management System.

    The site is JFFNMS.org [jffnms.org]
    Look at the features, it has all you need, and of course the screenshots.

    It will work on any Unix with PHP support, it will also monitor any standard compilant SNMP device or TCP Port, also if you have SNMP enabled it will tell you now many connections do you have to the specified port, apart from the connection delay.

    Its open source, and fully supported, I just made the latest release a few days ago.

    You could also look at the two working demos.

    I hope any of you could use it, it really shows a lot of things about a host, that being a Server or a Router.
    • Ohh.. I forgot to tell you that we are number 2 in Google for Network Management System [google.com]

      And that we have really nice graphs to show the server health, and also have a good trigger/action system so you can get emails or sms messages when something happens.

      If you have any question, please ask it on the JFFNMS List.

      Javier
  • I know you're looking for something free, but others here with some dollars to spend might like this. ProactiveNet [proactivenet.com] does standard monitoring of network devices, can grab any variable available via snmp, microsoft perfmon counters, or even using shell scripts to parse data and return values you wish to monitor. It also has very extensive monitoring capabilities for just about any kind of database (it can execute any query you wish or monitor performance tables), and many kinds of middleware.

    It keeps a data
  • Nagios [nagios.com]

    Works great, easy to configure, and can do all of the things you are requiring (CPU load/memory/processes/etc). It has a very robust dependency mechanism, and has many levels of notifications.

    I've been using it for 3 years now with zero problems. It looks like v2.0 will be out in beta form by the end of the month.
  • by Karora (214807)
    Sheesh, is Slasdot a substitute for research?

    Nagios [nagios.org] - I'll say it again.

  • I am at this very moment experimenting with OpenNMS (www.opennms.org) in my testlab. Perhaps that is worth some investigation.
  • For a specifically Solaris solution, look at Orcallator [orcaware.com], but read my experiences [fluff.org] with that and SARGE first.

    I'd second the various Nagios recommendations. The object templating configuration is very powerful once you get your head round it.

    Ade_
    /
  • Lrrd is great for graphing. You can graph anything through a simple script, and a lot of example script are allready included.
    Lrrd uses a single server that polls one or more clients for information.
    Nagios is better at monitoring the network as a whole, and responding to events. If for example a router goes down, nagios knows that the servers behind it will be unreachable as well, and won't bother you with alerts for them. As nagios can also react to events, it would be possible to change the default route
  • Yes nagios is the best. I've had it running totally on Solaris and you can also hack in Windows support. Also wit hthe right plugins you can monitor load, disk space etc...

    Rus
  • I've been using the very inexpensive ServersAlive from Woodstone [woodstone.nu] since 1999, and I've been very pleased with it. It's much friendlier to use than Big Brother or MRTG (and yes, I use both of those as well). The user interface is great, very easy to point-click your way through, and you can also SSH or Telnet into it to do other administrative tasks.

    It can check everything from pings, snmp, databases, web pages, services, processes, port checks, and more. For whatever it doesn't check, you can design ex
  • My project, Loggerithim [loggerithim.org] is right up your alley.
  • We have had great success with Nagios [nagios.org]. We even wrote custom plugins to monitor certain other aspects of our custom system (in PHP, no less).

    S
  • At work here we use a combination of two things to monitor our servers. First is Nagios [nagios.org] (previously NetSaint). Nagios is good because it can do very basic checks from just pinging a server to see if it's up (and network routers, switches, firewalls, printers, etc...) to actually checking to see if a certain service is up. Such as requesting a webpage to make sure that your HTTP server is running, or making an SMTP or FTP request to check that those services respond too. (it also does more, but there's
  • We use NAGIOS [nagios.org] to monitor our ISP network of 125+ machines and nearly 600 independent services. Completely customizable with plug-in modules to monitor anything you like.
    I remember an older one called Big Brother that was a little lighter weight.
  • What about spong?

    description: A systems and network monitoring system -- server programs
    This package includes the spong daemon, which collects and stores
    information from the spong client programs, and the program for sending
    out messages when problems occur.
    .
    Spong is a simple systems and network monitoring package. It does not
    compete with Tivoli, OpenView, UniCenter, or any other commercial
    packages. It is not SNMP based, it communicates via simple TCP based
    messages. It is written in perl and
  • There was a brief mention of OpenNMS earilier; Clearly this needs some more input. Nagios is a great tool too, but it is not as geared towards enterprise use.

    OpenNMS is.

    OpenNMS handles all common port services and SNMP/MIB capability (as any NMS should do). It does everything all the tools mentioned above here can do (and even incorporates a few).

    It has a front-end powered by apache tomcat4 and uses postgreSQL(like Nagios) for it's database. It has commercial support, is easily deployed on multiple

  • Nagios [nagios.org]. Simple as that. You won't regret it.
  • Two days and no one's mentioned Nagios or OpenNMS? Both massively popular and useful.

The sooner you fall behind, the more time you have to catch up.

Working...