On Maintaining httpd Logs... 13
A nameless submittor dropped this in my in-bin: "I help run a site that's rapidly gaining popularity. However, I wonder how other people out there handle the large amount of logs that are generated on a busy system. How long to do you keep all those Apache logs? What about the messages logs, etc? They take a lot of space and quite often I just don't they they're worth keeping around. Thoughts?" My thoughts on this are simple: If you are serious about your site, then the logs are worth keeping. You don't have to keep them online (tape backups work well here), but the statistics within can give you valuable information on the future handling of your site. Any other thoughts?
Re:On that note, Web Log Parsers? (Score:2)
(Don't remember the URL now, but I'm sure that you'll find it easily in any search).
Re:On that note, Web Log Parsers? (Score:1)
http://apps.freshmeat.net/homepage/89 0390921/ [slashdot.org]
Another good web log analyzer is "webilizer":
http://apps.freshmeat.net/homepage/88 4569634/ [freshmeat.net]
Hope this helps.
Re:Logs and stuff (Score:1)
-----
Re:On that note, Web Log Parsers? (Score:1)
I use Webalizer [mrunix.net], too.
Logfile treatment (Score:1)
We build weekly statistics from the (new) logs before we archive them on CD-Rs (2 CDs full of compressed logs per week, *sigh*). The weekly statistics are published on our intranet sever for reference.
The stats are built with analog and some highly optimized, specialized programs (dumb but fast: ~10MB thruput/second). I could publish them if you are interested.
Cronolog is a useful utility (Score:1)
Do a search on Google [google.com] or somewhere for it.
Ade_
/
What to do with logs? (Score:3)
In my view, the logs themselves aren't as important as the information they contain. Therefore, use a comprehensive analysis tool, whether one of the commercial tools, a free one written in Perl, or write your own, and extract the relevant information, and then remove your logs.
Tape backups do indeed work well here, but not all logs entries are created equal. If your site is very image-heavy, you probably don't want to keep the thousands of entries for each inline jpeg; you want the records of the page views.
Sites running Apache/mod_perl (or sites where the administrator is not afraid of Apache and their C compiler) can modify Apache so that it logs only what you want. A PerlLogHandler under mod_perl with return DONE if $r->content_type =~ /image/ at the top will save you hundreds, if not thousands, of (possibly useless) log entries in your logs files. On the other hand, a 30 Gig tape will hold years worth of bzipped logfiles...
darren
Learning experiance (Score:2)
Save the logs forever! (Score:1)
We used to archive separate logs for access, error, and referer, but now Apache's combined logs have made life much easier. (Analog also is a nice bonus -- talk about quick stats!)
We typically download all the httpd logs for a quarter, burn them onto a CD-R, and store them. This on top of daily incremental backup with weekly full. That way, if we want to analyze the data later, we have the logs ready to go, rather than having to track them off a tape. (I myself have used old logs in this manner several times.)
I've heard that you can gzip the logs on the fly directly from Apache, but thought that might lead to unwanted cpu overhead due to its constant utilization on busy web sites; anyone got any anecdotal evidence on this one?
Compression - Remove Image HTTP Requests (Score:1)
Another way to chop log files down to size is to remove image requests. There may be circumstances where you wouldn't want to do this, but for the average web site it cuts log size dramatically. My experience is at _least_ by 2/3. And that's just for sites that use small numbers of graphics per page... if you've got more, you'll see further shrinkage.
You can do this after the fact with some kind of script/program (I've used Perl, and also once suffered through doing it in C), or you could change your site so that it simply accesses the images from another domain/server so the logs are kept sepearately.