Monitoring Your Unix Boxen? 59
"I know a few people who 'tail -f' the main log files, or who run 'top' every so-often. These require constant monitoring though, and you could miss essential error messages if you step away for too long. Are there any projects that do this successfully? I've seen a couple out there that started to do this, but appear to be abandoned.
Ideally, I would like some type of all-in-one, that possibly generates a daily (email/web) report of network statistics, user logins, and (web)server traffic/hits, as well as anything 'suspicious' that might be happening, perhaps what apps have been taking most of the processor time, or if any of the daemons have been busier than they normally would be. I know there probably isn't one single app out there that does all of this, so what's the best configuration , for keeping tabs on multiple machines, something I can skim for a minute or two each day, to make sure things are the way they should be? I want to know what works best, and just as importantly, what *doesn't* work (I do realize that relying on a single solution would be bad here too, so if you have more than one suggestion, that would be appreciated)."
Tripwire (Score:5, Informative)
Tripwire.org [tripwire.org]
FAQ [tripwire.org]
sourceforge page [sourceforge.net]
Big Brother (Score:5, Informative)
All that being said, I found it to be flukey in its behavoir. Sometimes it would report that everything was not responding and it had to be punted before I would get the all clear. The other negative is the license [bb4.com]. The program consists of nothing more than shell/perl scripts so it's obviously open, but it has some strange clauses about Non-Commercial use.
Overall, I'd recommend trying something else, because BB was unreliable in my use, but YMMV.
Re:Big Brother (Score:5, Informative)
That said, it does appear to be a capable, fully-featured package and I'd guess that as long as you take the proper precautions you should be OK. I can't comment on the stability though; the security concerns I had were enough to cause me to move along to the next product on my list.
Re:Big Brother (Score:3, Insightful)
I've run BB for a number of years, and I got a good laugh from that article.. thanks..
the security concerns I had were enough to cause me to move along to the next product
The thing is, that if you've got security concerns, then you souldn't have a problem with using BB, because you're already aware of what needs to be done to prevent this information leakage.
The article you linked to didn't provide me with anything I didn't know befo
Re:Big Brother (Score:1)
At a previous employer, I ran BB from a Linux machine to monitor a pretty diverse set of boxes (Irix, FreeBSD, Win2k, WinNT4, AS400, Cisco 2500 and 7500 series, etc...) One of the best parts about BB is it's extensibility; any kind of shell script can be implemented as an monitor/alarm generator for BB, making it *extremely* nimble.
Securing the installation is easy enough, if you're not a numnuts.
Keep an eye on your network traffic (Score:5, Informative)
Re:Keep an eye on your network traffic (Score:2, Interesting)
Re:Keep an eye on your network traffic (Score:1, Funny)
The insect - there is one outside my window right now that just wont shutup. I think they irritate me mostly.
The game - Well, its pretty long and slow to watch, but i'm Australian and we rule the world in both test and one-day, so I guess I like it.
The program - Meh. I can take it or leave it.
Re:Keep an eye on your network traffic (Score:2)
We ended up graphing all sorts of interesting stats (CPU time, disk access, network latency times, etc.).
One of the best things that Cricket gave us was the ability to see correlations between our webserver response times and various other stats. So, for instance, we found out that our webserver response times dropped at the same time that our NFS file system times dropped and our iostats on on
logcheck (Score:5, Informative)
To quote a recent job candidate I interviewed.. (Score:5, Funny)
He's watching you.... (Score:5, Informative)
The extensions for BB are at http://www.deadcat.net/
I also like tripwire. Checksums of files on the system to know if important files have been changed. last time I used TripWire it has email alerts. The paid for version has an enterprise monitor.
LogWatch is another. Generates email.
Go through your linux and bsd daily, hourly and weekly scripts to see all the tools they run by default. These can be moved to most Unixs. Since most of these are shell and perl rpograms, some might be adaptable under windows using activeXPerl or Cygwin.
The hardest part is fine tuning the emails and alerts to those things you really care about.
MTRG and agreat snmp tool and tied in with BigBrother.
I've has to set these up for security purposes at one site. For monitoring a server fam at another site. A compile farm for doing builds at my current job.
It's all about Nagios... (Score:5, Informative)
Re:It's all about Nagios... (Score:5, Informative)
One thing I like to do personally is randomly pick a startup script (that's actually used in a particular server's configuration), and bury a single line in it that emails me "hostname has rebooted!" as the subject whenever it reboots. That way I know if a machine is ever rebooted with or (more importantly) without my knowledge.
Re:It's all about Nagios... (Score:1)
Horrible Configuration (Score:2)
I'll just check back on their site every few months. When they've got m4 for Nagios, we'll talk.
-Waldo Jaquith
syslogd? (Score:1)
or something
I'm not sure if I'm bullshitting or not
I'm likely misinformed
Re:syslogd? (Score:3, Informative)
We've got a nifty setup where we have syslog-ng running on our central syslog server. syslog-ng then squirts the data directly into a MySQL database. We've then got a custom PHP interface which sorts the errors by severity and colour codes them so we can always see what is going on. Our switches write to it. Our nokia firewalls write to it. Even the F5 load
Nagios (Score:5, Informative)
Re:Nagios (Score:2)
Re:Nagios (Score:4, Informative)
Nope. SATAN [fish.com] was a vulnerability probing tool that came out of SGI quite a while back. SAINT [wwdsi.com] was based on it (at least in function, I don't know if the code was based on it). They have nothing to do with Nagios.
The previous version of Nagios was called Netsaint, but they changed the name to Nagios because of possible trademark problems with WebSAINT [wwdsi.com], which is a web based tool that uses SAINT.
From the notice at the bottom of netsaint.org [netsaint.org]: NetSaint is not affiliated with World Wide Digital Security, Inc. (WWDSI); Richard S. Carson and Associates, Inc; and the marks WEB SAINT, SAINT, SAINTWRITER, SAINTEXPRESS, and SAINTBASIC owned by Richard S. Carson and Associates, Inc.
And I may as well mention that Nagios/Netsaint [nagios.org] is a really great tool and I highly recommend it. It won't, however, keep you up to date on "suspicious" activity - it's mostly for just making sure that your server and any services that run on it are going.
Re:Nagios (Score:4, Informative)
Well actually it _can_ keep you up to date on 'suspicious activity' if you are willing to write a plugin to say, monitor your IDS output.
Nagios itself is nothing more than a web-based system of notification. The plugins provide whatever functionality you code into them, from monitoring a network service, to parsing a logfile, to monitoring temperature. Pretty much anything that provides you with feedback can be used as the input to a plugin.
I actually wrote a little plugin that parses the output from my Win2k Terminal Server logs (via BackLogNT) on my central syslog server to email me everytime my boss logs on and logs off from Windows so I can figure out when he is leaving home and on his way into the office.....and he has yet to catch me playing games when I should be working.
The long and the short is that Nagios handles the notifications, the plugins handle what is being measured/monitored.
Orca (Score:3, Informative)
I use Orca [orcaware.com] (but then I'm its author :) ) to monitor Solaris and Linux boxes. I used it at Yahoo!/GeoCities to monitor 200 boxes and it was easy to see when systems were doing odd stuff.
Sample Solaris [orcaware.com] and Linux [orcaware.com] plots. The Solaris version shows a whole ton of web server stats.
Lots of stuff (Score:4, Informative)
monit will monitor running damons and can restart them if they crash, use too much CPU/RAM, etc, mailing about anything interesting.
tripwire or lire are nice for monitoring filesystem integrity, but these tools aren't easy to use. The database they use must not be located in a safe place, which can make them impractical.
I think the best thing would be doing all logging to a safe computer that only runs the logging daemon, so that you can be sure you're not missing anything.
Re:User of the word boxen (Score:2, Funny)
i like "boxen" for the same intriguing subcultural contexts as i like "w00t" and "geek".
remain happily hypocritical
Re:User of the word boxen (Score:3, Informative)
Re:User of the word boxen (Score:2, Funny)
Whenever someone uses that word to me i turn around and stop listening to them, it really makes me question there inteligence both in the IT field and in general inteligence.
Perhaps we should turn around and stop listining now....
Josh
Re:User of the word boxen (Score:1)
Red Hat Comes with Logwatch (Score:2, Informative)
Adminux (Score:4, Informative)
I rolled my own (Score:3, Interesting)
I rolled my own, mostly in Ruby (and ran it in parallel with the previous solution for several months). The main reason? I wanted to know about the things I wanted to know about, and not have to dig the information out of a lot of other cruft. So I do a lot of filtering to supress details that fall within what I define as "normal" for my setup, and only report the exceptions.
The main benifit of this turned out to be that I learned a lot about a configuration that I thought I knew inside and out. Yes, it was more work than dropping in a ready made package, but in retrospect it was well worth it.
-- MarkusQ
LogMon (Score:2)
I wrote an app called LogMon [edespot.com] that allows the user to sorta have multiple 'tail -f' sessions in one terminal (does a 'split-screen' effect). Also does syntax coloring in a user configurable file...
Tabs on Servers? (Score:1)
Well, Office Depot has an excellent selection of tabs, I prefer the plain clear ones, but they also have packs of the colored ones................
When I fist ran across the problem of monitoring servers, I downloaded every one I could find, got free trials of all the commercial ones. I ended up with netsaint, not because it was better, but because it did exactly what I wanted it to do and nothing more. I wrote a couple of little modules for some
monitoring, no one size fits all (Score:5, Insightful)
here are aspects where you can compare what you will find
aspects of monitoring:
-availability
-uptime(subtly different from availability)
-performance
-security
-capacity
-log or otherwise event-based monitoring
nature of tools:
-web based
-daemon with web based front end
-daemon without web based front end
-other
language tool is written in, license and source
-closed source, nuff said, available in licensed per cpu, licensed per target/service, etc...
-open source, but with paid-for license that includes support(shameless plug... I do support for this kinda thing)
-open source, roll your own support
-perl
-php
-java
-python
-c/c++
integration with other products
-by snmp traps
-by snmp agent extensibility(smux/agentx/proxysnmp,etc...)
-by proprietary methods
-by sharing a RDBMS with another monitoring tool(usually used for things like remedy ARS)
measure of performance/capacity/throughput/usage
-by the exec family of functions
-by the language of choice's own internal library conventions
-by snmp
-by proprietary methods to a Manager of Manager or NMS system
-by ciscoflow/other hardware vendor's protocol
-by parsing logs
-by exec-over-ssh-connexion
examples that don't fit neatly into any category that comes to mind is monitoring of backups(were they performed, how much, which files were skipped, etc, location in jukebox of which tape for which file...
Hope this helps you even draw the lines towards evaluating the product that meets YOUR needs
Palantir (Score:3, Informative)
my 2 lines of perl... (Score:3, Informative)
It's based on RRD [ee.ethz.ch] the successor of MRTG (not much developed anymore, but still a good tool). Thanks Tobi btw.
OpenNMS [opennms.org] is a really powerful realtime monitoring tool
Nagios [nagios.org] also...
Don't forget snort [snort.org] for your IDS needs and add acidlab [cmu.edu] for good visualization of snort's results.
Cacti (Score:2, Informative)
deja intermapper (Score:2)
One size fits none (Score:2)
As an example, we use the following:
Nagios
Notifications and real-time monitoring.
Logcheck
Daily syslog reports.
cfengine
Configuration and limited problem correction.
SAR
Performance data. Well, it was free with the OS. Unfortunately, we d
gkrellm (Score:2, Informative)
It's skinnable, configurable and supports plugins. I've seen it working on Solaris and Linux, YMMV. It's here [wt.net] (with screenshots).
Zabbix (Score:2)
FWIW: Not what you're looking for, but... (Score:1)
Sitescope takes the cake! (Score:1)
The best part about SiteScope is that it does not require any sort of client on the servers that it monitors. It uses SSH/telnet/rlogin/etc to make a connection and use normal system utilities to parse out the data that it needs. You can even moni