Server Monitoring Solutions? 58
bwhaley asks: "The University I work for has asked me to research software solutions for server monitoring. More specifically, a piece of software that will monitor server variables such as load, swap usage, POP/IMAP processes, total processes, and all the other interesting data about a server's health. Watching these variables can give administrators advance warning about potential problems with the server. We are currently using an in-house solution written in Perl but its age is showing. I have found plenty of proprietary solutions such as HP OpenView and Sun Management Center, but these cost thousands of dollars. What solutions do Slashdot readers use? Are there any powerful open source solutions that I'm missing? Is anyone else running homegrown software that they are happy with? We are running an entirely Solaris environment but I am interested in any UNIX solution."
CompSci students? (Score:3, Insightful)
That's something you can pass off as helping everybody, saving y'all money and teaching compSci kids how to work with the computers and OSes
Re:CompSci students? (Score:2)
I was headed towards setting up perfmon as a service and having one machine lookup the values from all the other machines and display them in either graph or save it as data - but this is obviously not the answer you were looking for.
Hey, I tried. I am only just now coming up to speed on Linux so it will be a while before I am useful in that arena. Slashdot motto : if yo
Big Brother (Score:1)
Nagios... (Score:1)
Re:Nagios... (Score:1)
Nagios (Score:2)
Re:Nagios (Score:4, Insightful)
It can monitor all kinds of machines, services, ports, networks, pings, traceroutes, anything. Beautiful setup, and highly recommended.
--Dan
Re:Nagios (Score:1)
Re:Nagios (Score:1)
The only advice I can give is take the time and read the docs. They are very good and understanding what's going on will save you loads of time down the road when you want to add stuff.
later,
ajay
Nagios (Score:1)
Re:Nagios (Score:1)
Big Brother (Score:2)
There's a vibrant community with lots of scripts [deadcat.net] to extend functionality.
It's free as in beer (but not freedom) for almost all uses, and is open source. You only have to pay if you use it to generate money.
BigSister (Score:2)
Easy (Score:2)
Re:Easy (Score:1)
its a big bunch of information
it is fun to find some degree of paterns
Re:Easy (Score:2)
It's nice to be able to analyze the historical data to make predictions and such.
Re:Easy (Score:2)
top (Score:2)
Re:top (Score:2)
top for Solaris and other Unices is here [groupsys.com]. It's great for monitoring a single system in real time, but it's not what the poster is seeking.
Ade_
/
Alarms are good, but... (Score:1)
What about Nagios? (Score:1)
Its meant for Linux but works under most *NIX variants
Big Sister (Score:2, Informative)
Re:Big Sister (Score:2)
Service failures generated emails, and we also configured it to sned an SMS to us out of office hours. The servers were mostly windows NT boxes, so when a BSOD took out a web or FTP server, we were alerted within a few minutes. The default was about 20 minutes, I had to tweak that setting. That was easy because it's all written in perl (with the exception
Re:Big Sister (Score:1)
port anymore - without sponsoring (-which IMHO is a good way to go)..
Re:Big Sister (Score:1)
Re:Big Sister (Score:1)
We are doing a similar evaluation where I work. I think we'll end up with OpenView if the costs work out OK. There are other good commercial solutions on the market, such as Foglight, Storage Profiler, Sun Management Console, Tiv
Nagios (Score:2)
Two suggestions (Score:2)
Second, you might try Sun netconnect since you are running all Solaris. I haven't used it myself, but some people at my nameless company have and think well of it.
Re: (Score:2)
SNMP + MRTG/Cricket/... + Mon (Score:5, Informative)
If your Unix system doesn't come with one Net-SNMP [net-snmp.org] will install on many of them.
The SNMP daemon by default understands how to monitor Load Avg, Memory, Processes, and so forth. It may not be able to tell you details of the process, such as what user is logged into the POP3 daemon, but it will tell you that you have 500 of them running, and alert you (via SNMP Traps) of that fact.
ALl you need to do once you have checked the documentation for your SNMP agent and then configured it, is to setup a single (ok, maybe 2 or 3) machine to send your traps to so you can kick of alerts. With some simple scripting in $FAVORITE_SCRIPTING_LANGUAGE you can email, page, text message, update web page, or $OTHER.
Cricket [sourceforge.net] or MRTG [ee.ethz.ch] are nice utilities that will poll the servers in question (by default every 5 minutes) and produce graphs. MRTG was designed to handle network equipment and graph the bandwidth utilization, but with a change to the SNMP string, will graph anything. Cricket is the same concept but does things a little differently by using a tree configuration system for property inheritance and does graph generation on the fly instead of the at poll time method MRTG uses.
And last but not least, Transmeta produced a very good perl script monitoring package known simply as Mon [kernel.org]. This package will do active polling of the servers including issuing a transaction to the service you are monitoring. Due to the way this software monitors, you can actually see if the remote machine is alive by actually utilizing the service to monitor instead of just the "I can ping it, it must be up" mentality some people have.
Best part about all the above mentioned software is that they are all applications with an OSI Approved OpenSource license. This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.
And you may wonder about the impact of system performance due to the monitoring by SNMP, MRTG/Cricket, and Mon. The short answer is that I couldn't detect a noticable increase. Other utilities such as Argent (Commercial Pay For Software) would impact a HP-UX V Class 8 CPU with 8GB RAM machine from 0% on all 8 CPUs to about 20% on ALL 8 CPUs while it telneted to the machine, created about 150KB of test scripts, and then ran them.
JFFNMS (Score:3, Informative)
JFFNMS - Just for Fun Network Management System.
The site is JFFNMS.org [jffnms.org]
Look at the features, it has all you need, and of course the screenshots.
It will work on any Unix with PHP support, it will also monitor any standard compilant SNMP device or TCP Port, also if you have SNMP enabled it will tell you now many connections do you have to the specified port, apart from the connection delay.
Its open source, and fully supported, I just made the latest release a few days ago.
You could also look at the two working demos.
I hope any of you could use it, it really shows a lot of things about a host, that being a Server or a Router.
Re:JFFNMS (Score:1)
And that we have really nice graphs to show the server health, and also have a good trigger/action system so you can get emails or sms messages when something happens.
If you have any question, please ask it on the JFFNMS List.
Javier
Re: (Score:2)
One word.... (Score:1)
Works great, easy to configure, and can do all of the things you are requiring (CPU load/memory/processes/etc). It has a very robust dependency mechanism, and has many levels of notifications.
I've been using it for 3 years now with zero problems. It looks like v2.0 will be out in beta form by the end of the month.
Nagios (Score:2)
Nagios [nagios.org] - I'll say it again.
OpenNMS (Score:1)
Been there, done that... (Score:2)
I'd second the various Nagios recommendations. The object templating configuration is very powerful once you get your head round it.
Ade_
/
lrrd & nagios (Score:1)
Lrrd uses a single server that polls one or more clients for information.
Nagios is better at monitoring the network as a whole, and responding to events. If for example a router goes down, nagios knows that the servers behind it will be unreachable as well, and won't bother you with alerts for them. As nagios can also react to events, it would be possible to change the default route
Again Nagios (Score:2)
Rus
Windows guys should check out ServersAlive (Score:2)
It can check everything from pings, snmp, databases, web pages, services, processes, port checks, and more. For whatever it doesn't check, you can design ex
Loggerithim (Score:1)
Nagios (Score:2)
S
Nagios + Cricket + SNMP (Score:1)
NAGIOS is the best I have seen (Score:1)
I remember an older one called Big Brother that was a little lighter weight.
Server Monitoring Solutions? (Score:1)
description: A systems and network monitoring system -- server programs
This package includes the spong daemon, which collects and stores
information from the spong client programs, and the program for sending
out messages when problems occur.
Spong is a simple systems and network monitoring package. It does not
compete with Tivoli, OpenView, UniCenter, or any other commercial
packages. It is not SNMP based, it communicates via simple TCP based
messages. It is written in perl and
OpenNMS is going to lead the way. (Score:1)
There was a brief mention of OpenNMS earilier; Clearly this needs some more input. Nagios is a great tool too, but it is not as geared towards enterprise use.
OpenNMS is.
OpenNMS handles all common port services and SNMP/MIB capability (as any NMS should do). It does everything all the tools mentioned above here can do (and even incorporates a few).
It has a front-end powered by apache tomcat4 and uses postgreSQL(like Nagios) for it's database. It has commercial support, is easily deployed on multiple
Nagios (Score:2)
for crying out loud (Score:2)