Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

On Maintaining httpd Logs... 13

A nameless submittor dropped this in my in-bin: "I help run a site that's rapidly gaining popularity. However, I wonder how other people out there handle the large amount of logs that are generated on a busy system. How long to do you keep all those Apache logs? What about the messages logs, etc? They take a lot of space and quite often I just don't they they're worth keeping around. Thoughts?" My thoughts on this are simple: If you are serious about your site, then the logs are worth keeping. You don't have to keep them online (tape backups work well here), but the statistics within can give you valuable information on the future handling of your site. Any other thoughts?
This discussion has been archived. No new comments can be posted.

On Maintaining httpd Logs...

Comments Filter:
  • Analog?

    (Don't remember the URL now, but I'm sure that you'll find it easily in any search).

  • Here is the analog web site:

    http://apps.freshmeat.net/homepage/89 0390921/ [slashdot.org]

    Another good web log analyzer is "webilizer":

    http://apps.freshmeat.net/homepage/88 4569634/ [freshmeat.net]

    Hope this helps.

  • Apache on Debian automaticially backs up all site logs including all vhosts and such. It maintains two weeks of uncompressed logs, and the rest are kept compressed. Webalizer will support this with it's database. I'm not sure what it does, but I figure the information is worthwile anyhow.

    -----
  • Here is Analog [cam.ac.uk]

    I use Webalizer [mrunix.net], too.
  • Logfiles - especially httpd or firewall logs - are extremely compressable - expect them to shrink with factor 20 or better.

    We build weekly statistics from the (new) logs before we archive them on CD-Rs (2 CDs full of compressed logs per week, *sigh*). The weekly statistics are published on our intranet sever for reference.

    The stats are built with analog and some highly optimized, specialized programs (dumb but fast: ~10MB thruput/second). I could publish them if you are interested.
  • Cronolog can be used on the end of a pipe from Apache (or presumably anything that generates logs similarly) and will automatically write logs to paths keyed on date. E.g. if you want to collect each month's logs in separate dirs, cronolog will write to 1999/Oct, 1999/Nov, etc. It's an extremely useful way of splitting up your log files chronologically without writing scripts to restart Apache and move the old logs.
    Do a search on Google [google.com] or somewhere for it.

    Ade_
    /
  • by dlc ( 41988 ) <dlc@noSPaM.sevenroot.org> on Friday November 26, 1999 @04:24AM (#1504588) Homepage

    In my view, the logs themselves aren't as important as the information they contain. Therefore, use a comprehensive analysis tool, whether one of the commercial tools, a free one written in Perl, or write your own, and extract the relevant information, and then remove your logs.

    Tape backups do indeed work well here, but not all logs entries are created equal. If your site is very image-heavy, you probably don't want to keep the thousands of entries for each inline jpeg; you want the records of the page views.

    Sites running Apache/mod_perl (or sites where the administrator is not afraid of Apache and their C compiler) can modify Apache so that it logs only what you want. A PerlLogHandler under mod_perl with return DONE if $r->content_type =~ /image/ at the top will save you hundreds, if not thousands, of (possibly useless) log entries in your logs files. On the other hand, a 30 Gig tape will hold years worth of bzipped logfiles...

    darren

  • I'm kinda suprised that nobody's mentioned it yet. Building a log statistics and storage system is an excellent way for somebody to pick up Perl or Python knowledge, or to enhance what they have. Use Apache's CustomLog directive to get referer and user-agent info in your logfiles, and let your imagination run wild as to what kind of data can be mined out of them. User tracking from page to page doesn't require cookies. As compressable as log data is, there's no real excuse not to save it. If you've got enough traffic that logs are taking up disk space you want, you've already got a tape drive or something (right? you'd better...)

  • The way we do it here is that we basically want to save the httpd logs forever. You never know when those logs will come in handy, particularly if you run any commerce sites -- it's nice to be able to track down IP subnets to add more empirical evidence. Plus, new statistical techniques may be developed down the road. I'm not as big a fan of archiving the messages log; however, those get backed up daily themselves with the rest of the servers.

    We used to archive separate logs for access, error, and referer, but now Apache's combined logs have made life much easier. (Analog also is a nice bonus -- talk about quick stats!)

    We typically download all the httpd logs for a quarter, burn them onto a CD-R, and store them. This on top of daily incremental backup with weekly full. That way, if we want to analyze the data later, we have the logs ready to go, rather than having to track them off a tape. (I myself have used old logs in this manner several times.)

    I've heard that you can gzip the logs on the fly directly from Apache, but thought that might lead to unwanted cpu overhead due to its constant utilization on busy web sites; anyone got any anecdotal evidence on this one?

  • As has been mentioned, log files to tend to compress well; at least that's my experience with gzip.

    Another way to chop log files down to size is to remove image requests. There may be circumstances where you wouldn't want to do this, but for the average web site it cuts log size dramatically. My experience is at _least_ by 2/3. And that's just for sites that use small numbers of graphics per page... if you've got more, you'll see further shrinkage.

    You can do this after the fact with some kind of script/program (I've used Perl, and also once suffered through doing it in C), or you could change your site so that it simply accesses the images from another domain/server so the logs are kept sepearately.

"Can you program?" "Well, I'm literate, if that's what you mean!"

Working...