
Network Monitoring and Alerting? 59
SpamMonkey asks: "At work I am trying to implement a central monitoring and alerting service. We have in excess of 250 Windows servers, approx 15 AIX servers and another 30 Linux servers (mainly SLES/Suse). My investigation into systems that will allow us to monitor critical areas on each of these systems has so far led me to a clustered Linux server running Nagios with passive and active checks. What I'm curious about though is how Slashdot readers are carrying out their own jobs and how they can comfortably sit back, without having to repeatedly check that various systems are still operational and how to cut down their own response times when something goes wrong."
Why? (Score:1)
Re:Why? (Score:3, Interesting)
Re:Why? (Score:1)
Thanks
Big Brother (Score:4, Informative)
Re:Big Brother (Score:4, Informative)
Big Sister.
http://bigsister.graeff.com/ [graeff.com]
Simple (Score:2, Interesting)
Security/monitoring is a process not a product. When you finish the checklist you start over.
Also, i would recommend trying to cut back the windows server's somehow, maybe mention the right words when the licenses expire or its that time of the year to upgrade to a "free" solution?
Re:Simple (Score:2)
He's talking about day to day alerts to notify if a machine's services aren't working when they should be. That'd be SNMP alerts, MRTG, something akin to Na
Nagios (Score:3, Insightful)
All in all I'm monitoring about 200 different processes across our network as well as running MRTG on the same box. Never felt once I needed to cluster.
Re:Nagios (Score:3, Insightful)
Remote services can be checked with ease, but stuff that needs to query the local system, disk-space or load for example, needs a different setup. I ran it through SSH, but I'm not sure the kind of load that wo
Re:Nagios (Score:1)
For that number of boxes, you shouldn't need to get any cluster, though a second redundant box would make sense incase the first dies.
As for the Windows machines, check out NRPE-NT. It's a service that'll run on each Windows box and run any required plugins.
Best type of network monitoring (Score:3, Funny)
Its Easy. (Score:2, Funny)
As for preventitive mesures, you put the new hire in charge of the task and sit back and relax.
1st rule of managment is misdirection.
Re:Its Easy. (Score:2)
Re:Its Easy. (Score:1)
Also curious (Score:5, Interesting)
Our big hitch comes from the fact that we only have a satellite connection to each of the remote sites, so we can't do real time monitoring, so things like HP Openview NNM are out of the question - they use too much bandwidth.
Our solution (And the reason I'm working late right now) is to build a custom suite of tools that does batch reporting every night by polling logs and custom programs, then sends it back in a handy xml file. We take that file, dump it into a large informix database, and then we can do whatever we want to create reports.
It's a little more work than just installing a package, but we're getting EXACTLY what we want out of the product. It works with our very unique communications and configuration, and it's modular so I can add whatever monitoring/checks I want by writing a new ksh script. All the output is standardized and all the parsing is done at the office by a very clever xml parser one of the db guys wrote.
I think for whaty ou're looking for, theres things like Big Brother, MRTG, HP Openvie umm
But I'd love some feedback for people who are working in a bandwidth sparse shop like me
Re:Also curious (Score:1, Interesting)
If you are using a realtime link you can just have it alert through the NOC.
Re:Also curious (Score:2, Insightful)
Re:Also curious (Score:1)
And it really is a good piece of software.
Re:Also curious (Score:2)
canadaboy (at) gmail (dot) calm
Intermapper Remote (Score:3, Informative)
You might be interested in Intermapper [intermapper.com] for its Remote [dartware.com] component. You can run a monitoring system at each location yet administer them centrally. Your remote datastream will basically be the set of events that's interesting to the Human In Charge.
It's commercial software, so you ha
Re:Also curious (Score:2)
Re:Also curious (Score:1)
We use Openview. We get at lot of false alarms for 'Node Down', when the alert should read 'bandwidth impaired'. The easily choked 128k link reaches capacity and dumps the low priority traffic. Naturally, the ICMP stuff like ping goes first.
The IBM equivilant to Openview is Tivoli Netview.
MANY ways to take care of event notification (Score:5, Interesting)
Why not slap a modem into the head nagios box and have it page you [nagios.org] when things fail. Don't worry about having to wear a beeper - you can page most cellphones via your carrier's SMS [nagios.org] gateway (still dial-capable).
Too much hassle? How about AIM [nagios.org]? YIM [nagios.org]? Jabber [nagios.org]? Email [nagios.org]?
If you're TRULY the teeth jittering, chain smoking NOC type, buy some x10 crap and build a physical network alarm interface [rutgers.edu] like I did
Re:MANY ways to take care of event notification (Score:2)
Re:MANY ways to take care of event notification (Score:2)
Monitoring Tools (Score:4, Interesting)
Here's a couple of the monitoring solutions:
Opennms [opennms.org]
Mon [kernel.org]
Big Brother [bb4.org]
For system information polling I'd go with:
Cacti [cacti.net] hands down this is the best polling system out there and it's simple to setup and run.
Re:Monitoring Tools (Score:2)
Just to clarify for others(I know by your sig you know this already), MON is not so much a network monitoring application as it is a framework with some production ready examples. I was really happy with my last MON setup, but I did put serious time into getting it setup and writing some scripts of my own. FWIW, this was 3 years ago, so maybe it comes with even more good stuff out of th
Re:Monitoring Tools (Score:1)
http://www.zabbix.com/ [zabbix.com].
It has its quirks, and it can be a little difficult to set up and get used to the first time, but it does its job well
Re:Monitoring Tools (Score:1)
If you are looking at Big Brother, I would probably recommend you look at Hobbit Monitor [sourceforge.net] instead. Hobbit is Open Source and unconstrained as is BB. They are developing a client-side piece to replace BBNT, as well. Hobbit extends the good things we like about BB and adds some other things we would have really liked to have SEEN in BB.
It seems to me, also, that BB hasn't enjoyed much development activity for
Re:Monitoring Tools (Score:1)
Indicative Software (Score:1)
I do testing for them, and we really do have a very cool system going. We monitor the systems, the network, and the applications. We have pre-defined tests for almost anything you could think of from simple http tests to fancier cisco router tests, to perfmon integration. We monitor Weblogic and Websphere, and soon are going to be able to monitor other app servers as well.
So ya, give us a call and inquire
IPSentry (Score:2)
Re: (Score:2)
polling and whatnot (Score:2, Informative)
We tried Cacti and just didn't like it. I've looked at Nagios, but not in detail.
Find JFFNMS at www.jffnms.org
Re:polling and whatnot (Score:1)
I'm the Lead Developer for JFFNMS.
Just in case you can go to the JFFNMS site [jffnms.org] and find the feature list, a flash movie, and four working demos.
I hope you like it.
Javier
Profiler Rx by Tek-Tools (Score:3, Informative)
Profiler Rx [tek-tools.com]
Write a script to update a web page. (Score:1)
Break it down and start with the small stuff (Score:4, Informative)
This is an entire category of Operations Management and can encompass everything. Don't take it lightly and don't be afraid to start small. The first thing you need to do is categorize what you want to monitor into individual sections and work on the easiest stuff first. By the time you work up to the tough stuff, you'll have an idea of whats available, what your capabilities are and hopefully the easy stuff can be quickly rewritten/integrated into a netter solution. Don't miss the critical stuff in a morass of junk alerts. Sample consideration (everything that moves in the data center):
Hardware:
---------
1) Server
2) Storage
3) Network
4) Power
5) Environment
6) Security
Software:
---------
1) OS
2) Applications
3) Security
Events:
-------
1) Failures
2) Alerts
3) Misfires
4) Security
Triggers:
---------
1) Notification/False positives
2) Action plans/Event handling
3) Documentation, Documentation, Doumentation
4) Reporting aka analysis and cya
5) Security
Once you're done building it, start over. The last tier is the most visible e.g. delegating a raid rebuild page to the opcenter flunky without proper documentation is a Career Limiting Move (CLM), building the best monitoring system is a fucking waste unless you pay attention to it. The most apt cliches for monitor normalisation are all military: Warrooms, Bridges, Weapons Hot, Communications channels etc. View everything as SNAFU and work from there.
Rule #1: Do not add to the problem.
sysmon (Score:2)
It's set up to do network topology
NetIQ (Score:1)
http://www.netiq.com/products/am/default.asp [netiq.com]
I think Microsoft sold a watered down version of it as MOM which seems to never have taken off.
Anyway, we looked at all the monitoring packages for keeping track of over a hundred Windows servers and it was the best for us due to flexibility, mid-cost, and capabilities to be centralized but monitor remote sites locally.
You can even write your own jobs to do what you want with almost no limits.
It has evolved an
Zabbix (Score:1, Interesting)
Check Zabbix [zabbix.com] if you're looking for a solution which is free, supports all platforms, and easy to deploy. Look at screenshots.
My company uses it for several months already in a mixed Windows/Unix environment (~180 servers) with great success. We use nearly all features (notifications, graphs, network maps, SLA monitoring, cool screens) Zabbix provides. Very useful stuff indeed. We tried Nagios before, but found it complex and hard to maintain. Besides performance of Nagios was disappointing (not enough tu
Re:Zabbix (Score:3, Insightful)
Having used it for a couple weeks now, I would call it "software with potential". It's not quite there yet, and has the feeling of being immature software. This is not a dig - quite the contrary. I think that zabbix has a *lot* of potential, but I think it needs a little more time before it's ready.
try argus... (Score:3, Informative)
it's also pretty flexible so you can plug it into just about any paging system and monitor just about any service you can imagine.
it handles heirarchies so you dont get 300 pages at once, etc. and it has a simple, fast, clean web interface which isn't bloated with gigabytes of shiny widgets and is even perfectly usable via lynx.
it has a few rough edges but the overall ease of use and simplicity make up for it.
HP OpenView Operations (Score:1)
At my workplace, we're using HP-OVO. We have 300+ HP-UX, 200+ Windows and 50-odd Linux servers. It is highly customizable and allows you to set up what all to monitor, the time slots within which to ignore them, etc etc etc. You basically have to install SPIs (Smart Plug-Ins) for each of the parts you wanna monitor (OS, DB, hardware, custom applications...)
And yes, it can also be configured to alert you via email/SMS.
-- rxmx --
network monitoring (Score:1)
Monitor and alert based on services / applications (Score:1)
The key here is to move beyond just instrumentation, and add a layer of service mapping and correlation. The challenge is to map the data and events you will get from Nagios (or other monitoring tool/agent) to the services that the systems are provisioning and your service objectives (performance/ava
Is there an open source Netcool? (Score:2)
I so far haven't come across anything open source that follows the netcool event processing model. For those of you who never had to use netcool (i'ts big $$$, btw, but very popular among large telcos/isps), all it is is a relational database based event processor that heavily relies on stored procedures and triggers to perform "deduplication" and correlation and is highly customizeable. I don't see anything that their (proprietary) database engine does that couldn't be done with PostgreSQL.
One more (Score:2)
This one [linuxtech.cc] is Don O'Neill's own combination of several different small packages. Basically, it provides a very nice front end to fairly extensive MRTG monitoring. But, it does give you an idea of what you can do and it certainly looks customizable.
OpManager (Score:1)
big sister and brother (Score:1)
http://www.bb4.org/
and big sister
http://bigsister.graeff.com/
Big Sister does for you:
* monitor networked systems
* provide a simple view on the current network status
* notify you when your systems are becoming critical
* generate a history of status changes
* log and display a variety of system performance data
I worked for a company thats been using big brother for years. great but the config syntax sucks. Big sister is easier i believe but im not sure.