Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Apache Software

Web Log Analyzers? 31

sammy.lost-angel.com asks: "What's the best web log analyzer out there today? It's time to upgrade our horribly out of date one and I'm not sure what's good out there at this time. Our site receives about 50,000 hits a day, so things like remembering what's already been analyzed can save a lot of time." What about log analyzers that can work on more than one type of web server? An analyzer that could parse access data for, say, IIS and Apache would be a nice tool!
This discussion has been archived. No new comments can be posted.

Web Log Analyzers?

Comments Filter:
  • Webalizer (Score:1, Informative)

    by Anonymous Coward
    http://www.mrunix.net/webalizer/

    I've had good luck running several million hits through Webalizer. It works pretty well.
  • Well the two web log analyzers I worked with at my old job were,
    WebTrends Professional
    and WebSphere Site Analyzer.

    Bottom line with WebTrends is, its junk. It costs a bundle, is more expensive for the unix version, and you need one base liscence for the first machine who's logs you want to analyze, plus one supplimental liscence for each additional machine. If your site spans four boxes, you need a base+3 additonal. PRICEY! To boot it is not very configurable, and it has a hellova time counting user sessions by custom cookies.

    WebSphere SiteAnalyzer on the other hand is a behmoth of a program. It requires far to many resource to run, takes forever to properly configure, and needs a tweaked version of DB2. On the plus side its highly configurable, and comes "Free" with websphere server afaik. You can count anyting on anything if you really want to, and you don't need to get a special version to do your own querries against the data. All the data is in DB2, so you are free to probe the data all on your lonesome. With Webtrends you need a special version to get access to the database, and then the access is only with their propiretary libs. Of course the other big plus for SiteAnalyzer is that it has a client server model, and the both can run on Linux, Solaris, HPuX, windos..etc.

    To be honest those are the two biggies for comercial site analysis software, and neither are that good. Check out some of the OS offerings, prehaps one of them will work for you :)

  • by rakerman ( 409507 ) on Monday October 22, 2001 @08:58PM (#2463549) Homepage Journal
    http://www.analog.cx/

    http://www.webalizer.com/
    • yah. And analog is *fast*.
    • by Anonymous Coward
      We use webalizer and its very cool. Nice graphs and break downs. Check it out for sure!
    • Yeah, I have been using Analog for about 5 years. It's fast as blazes and produces pretty useful reports (especially for my hosting customers who are not rocket scientists) but it's not perfect. Although it will cache DNS lookups, it doesn't checkpoint, so if you want your report to reflect the last 6 months of activity you have to crunch the last 6 months worth of log. The configuration syntax is also a bear to work with. Too many directives that sound alike, and it can be hard to dig through the documentation.

      I don't like the reports generated by anything else, including Webalizer, so I stick with Analog.
      • The fact is, crunching 6 months of logs isn't that big a problem. You probably only want to run reports covering:
        1. The total year (run once a year)
        2. The total month (run once a month)
        3. The last N days/weeks/months (I'd have said the last 30 days would be good enough; run this daily or weekly, depending on your tastes)
        Only the 3rd option there should cause any trouble; the others are run infrequently enough that you don't care if they take a while to run. If you're worried about affecting your web server while crunching the data, remember that you don't have to run the analyzer on the web server!

        Personally, I've used Analog as listed above and found it to be pretty good, once you get the configuration working (which is a once off thing). It will also work on compressed logs, IIRC, so you can even save some disk space (at the expense of more CPU time at analysis).

        • [Analog] will also work on compressed logs, IIRC, so you can even save some disk space (at the expense of more CPU time at analysis).
          ... but less elapsed time. Analog is primarily limited by disk speed, so you will get the results sooner from compressed logs than uncompressed ones. Strange but true.
          • I guess it depends on the disks and the CPU; you can run a SCSI-2/Fibre hard drive (not to mention striping/mirroring etc) off a fairly crummy CPU and you'd probably find the CPU as the bottleneck. On the other hand, an old hard drive on a P4/Athlon would probably be better compressed.

            The point is well made, though; hard disks tend to be the bottleneck on today's systems with clock speeds in the gigahertz.

      • by Stephen ( 20676 )
        ...it can be hard to dig through the [analog] documentation.
        I (the author) have some sympathy with this; but the main problem is that it's so configurable that there just are a lot of commands.

        I have done some work recently on presenting the documentation in different ways. As well as the main topic-based documentation, there's now a page with only the most basic commands for beginners; a comprehensive index; all the commands on a single page with a BNF-type grammar; and two sample configuration files with all the commands in, one in topic order and one in report order. There's also the beginnings of a collection of third-party HOWTO's (for which I need more volunteers, HINT HINT!).

        I do take a lot of time and trouble over documentation, I suspect much more than most open source projects. My rule is that no change can be committed until it's fully documented. So you will never find the documentation lagging behind the reality, or options missed out of the documentation. I also spend a lot of time rephrasing the existing documentation.

  • Last I checked, both IIS and Apache generate (or can be set to generate) W3C standard format logfiles. Part of the reason for having/using that standard is so that you don't get locked into a proprietary tool.
    • Last I checked, both IIS and Apache generate (or can be set to generate) W3C standard format logfiles. Part of the reason for having/using that standard is so that you don't get locked into a proprietary tool.
      You might think so, but IIS breaks the standard in several ways. And it's not even really a standard, just an early working draft that was never finished.

      In my opinion, a good logfile analysis tool should be able to recognise and analyse all commonly-used formats, and provide a means to specify custom formats. In other words, it should work with what the server has already produced, rather than force the server administrator to reconfigure the server and ignore old logfiles. My program analog [analog.cx] does all this, but most programs don't.

  • webalizer & awstats (Score:2, Informative)

    by EvilStein ( 414640 )
    awstats (awstats.sourceforge.net) for the IIS logs, but it's kind of funky to set up...

    and webalizer (www.mrunix.net/webalizer) for the Apache logs.

    awstats is Perl, too...

    I've used them both and since I have only Apache to log, I've stuck with webalizer. Plus, you can easily customize it for each user/domain with its own webalizer.conf file.
  • AWstats rocks! (Score:3, Informative)

    by OctaneZ ( 73357 ) <ben-slashdot2@NOsPaM.uma.litech.org> on Tuesday October 23, 2001 @02:27AM (#2464558) Journal
    I have been running AWStats [sourceforge.net] since July, and I absolutely love it. It does not provide the fine-grain detail that many people need, and which can be provided by Analog [analog.cx]. But it does provide exactly what 90% percent of us need, in an easy to view package. It creates an easy to understand page about many aspects of your site, including, users, page hits, countries, languages, OS, browser, spiders/robots, access times; it's great! It is also a GPLed perl script! The developement team is over at Source Forge [sourceforge.net] and is actively releasing new code all the time. It also has the added benefit of allowing cgi updating through a web page; simply putting the script in your /www/cgi-bin/ directory and adding appropriate permissions allows you to get up to the second information about your sight without having to dig up a terminal! Definately check this package out!
    -OctaneZ
    • Re:AWstats rocks! (Score:2, Insightful)

      by damiam ( 409504 )
      "I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones."

      If you're going to quote someone, at least give them proper credit:
      "I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones." - Albert Einstein

      • Yes it is Einstein, however I hit the 120 char limit and didn't notice, as it gives you no indictation of that fact, I am sorry for offending you; as it obviously did.
        -OZ
  • Analog (Score:5, Informative)

    by Stephen ( 20676 ) on Tuesday October 23, 2001 @06:47AM (#2464930) Homepage
    I'd like to plug analog [analog.cx]. I'm the author, so read my comments in that light. :-)

    First, as others have commented, the commercial programs suck [slashdot.org], especially Webtrends [slashdot.org].

    Analog is over six years old, but it's still actively developed, and I think it's still the leading free log analyser. The main contender is the Webalizer. To some extent it depends what you want (why not try out both?). The Webalizer's biggest advantage is that it produces prettier pictures. Some of analog's advantages are that it is more configurable; that it runs on any OS (the Webalizer is Unix only); and that it can analyse logfiles from any web server.

    Besides, analog's author reads Slashdot.

    • Webalizer's biggest advantage is that it produces prettier pictures.
      I should add that you can make analog's output prettier for your PHB[?] [everything2.com] if you use Report Magic [reportmagic.org] with analog.
    • Re:Analog (Score:4, Interesting)

      by frankie ( 91710 ) on Tuesday October 23, 2001 @09:49AM (#2465462) Journal
      I use Analog exclusively (well, after DNSTran for name lookups and Perl to sort out sub-logs) and I have found little reason to complain. As Stephen mentioned, you can use ReportMagic to prettify the output. I don't bother.

      My only complaint is Stephen's dogmatic insistence [analog.cx] on not performing any form of speculative analysis. For example, he refuses to even attempt visitor counting, path tracking, etc. The sort of stuff that bosses like to see, whether or not it's strictly accurate.

      Stephen could put WebTrends out of business with a couple hours of coding, but he has his principles.
      • by cdh ( 6170 )
        Agreed. Personally, I don't care, however, I run Analog and ReportMagic for all of our virtual host sites and dedicated server customers. There are a few that really, really want this info. I point them to the documentation and point out their folly, yet they still "need" this info. I agree with Stephen on this, but it would be nice if there was an option, even if it were named something like TOTALLYNONFACTUALPATHANAYLSIS just so I could keep my customers happy.

        I love Analog, it's pretty much setup and forget. I tried (at a customer's insistence) NetTracker and found it to be a nightmare. If it works, it's OK, however if it messes up, which it had a tendency to do, then it was terrible. It would overwrite its "database" periodically which would mean it would have to start from scratch again. This particular site gets approx 500K "hits" a day, and NetTracker would take literally 28 hours to do one day. Analog can do it in minutes. Gotta like that.
      • I agree.

        I have customers who want to see something that guesses at number of unique visitors, guesses at paths through the site, etc. They don't want to study, and don't care about the details of why it's unknowable information... they're used to seeing it from other packages, and complain that it's missing.

        Any kind of wild guess, with a bunch of caveats on the output, would be much more useful than the explanation of why this analysis is not done.
      • I use Analog and ReportMagic for my reports. Analog is super-detailed, extremely fast, reliable, and generally very cool. If you are a nerdy webmaster type that needs stats to track server usage and such, you can stop right here.

        Since I want web reports to show to VHost clients, I added ReportMagic, which builds out a great looking, graph-laden, generally beautiful HTML stats. Indeed, I think the ReportMagic graphs are considerably more readable than WebTrends'. As a bonus, RM is written is Perl and it's very hackable.

        I have beautiful stats run on over 100 web sites, every single night, and archived every month. I am very happy with the results, and I keep finding cool new features in the docs to add in all the time.

        However...WebTends is the de facto web report standard. Having used WebTrends, I know what a painful piece of shit it is to use. I often use it as an example of what is wrong with winduhs programming. Every feature you could ever want, none of it works right, moronic interface, breaks constantly, worthless tech support, frequent payware upgrades to keep it running. I just can't adequately express how much I hate it.

        However...WebTends is the de facto web report standard. And after about 400 times explaining how fallacious reports like, "paths through site" and "visitors" and such are, and pointing clients here: http://www.analog.cx/docs/webworks.html, I had to give up. The boss is suffering through doing monthly WebTrends reports for the dozen or so customers who whine.

        I have actually had this conversation:

        <Digger> ...so you can see how those reports are BS and what looks like a "trend" might just be a change in the way AOL handles it's proxying scheme...

        <Luser> Right. So can we get you to do a WebTrends report too? Cause we base our business plan on them.

        Seriously. I might as well be telling people to stop using winduhs as their desktop OS. I might as well be tilting at windmills. It's neigh hopeless. I would rather have a userbase that knows not to double-click on any old attachment they get in the mail, but since I am stuck in the real world, I have added scanning software to the mail server.

        Analog stats are certainly more than sufficient for me. Adding ReportMagic should make the output sufficient for anyone else, but it doesn't. Stephen has stated quite clearly that he has no intention of adding reports that offend his statistical sensibilities. I certainly understand the position he has taken, and since he is doing all the hard work, it is his right to take it. It is just a shame that most of the world is stuck having to run WebTrends, or some similar crapola commercial product as a result.

        Maybe Stephen would like to respond, or point to one of the relevant mailing list posts. What about cookieing solutions? I have custom-built a few of these, and cookie-enabled browsers have been about 99% or so. Is 1% still significant?

  • Wusage (Score:2, Interesting)

    Our company uses Wusage [wusage.com] and it's quite a nice package IMHO.
    It doesn't generate very pretty reports by default, but it is highly customizable and provides a truck load of data.

    Note: I am not affiliated with the makers of Wusage in any way.
  • Sawmill is great. (Score:2, Interesting)

    by tdyson ( 530675 )
    Sawmill, by Flowerfire [flowerfire.com] is pretty cool. It understand virtually every log you imagine. It'll run as a cgi, via cli or as a stand alone web server. There is a version for many different platforms. With the web interface, the Marketing group can do their own drill down and queries, so I can dosome real work. Performance is good. I think of it as the program that WebTrends wished it was. Get the eval version and take if for a spin.
  • We are putting our money down to replace our WebTrends Log analyzer system with Summary Pro(http://www.summary.net/ [summary.net]).

    It has a rich set of features, analyzes and summarizes logs across multiple servers, has a multitude of pre-defined reports and costs thousands less than Webtrends or IBM.
    (Total SW price between $59 - $695)

    Robert Merrill
    (With a large, American passenger-train corporation)

  • In case your pointy clicky windows boss or marketing people want to use something, I can suggest Open Web Scope [openwebscope.com], a shareware type app.

    In the meantime, use analog or webalizer to get the full skinny on your traffic.

  • Wusage is the best stats package out there. It does everything the expensive packages do and it doesn't cost a fortune. It rotates my logs, archives them and gives me a bundle of valuable stats for marketing, sales and R&D.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (3) Ha, ha, I can't believe they're actually going to adopt this sucker.

Working...