Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Linux Software

Statistical Analyzers for HTTP Logs? 39

krishnaD asks: "I have been using webalizer to generate access log reports for the site but lately my customers are asking for statistics like average amount of time spent by visitors on site, if a person reaches a page X, what is the probability that from here he goes to Y? From which link people exited the site etc. Basically, they are asking for a detailed flow analysis of usage patterns of visitors. Are there any tools that will do this kind of analysis? I love to know what kind of tools other sysadmins use to generate reports for their clients."
This discussion has been archived. No new comments can be posted.

Statistical Analyzers for HTTP Logs?

Comments Filter:
  • Karma whore link (Score:2, Informative)

    by Anonymous Coward
    Freshmeat Internet::Log Analysis [freshmeat.net]
    • Re:Karma whore link (Score:2, Informative)

      by balamw ( 552275 )
      Of that bunch, I must say that I really like Webalizer [mrunix.net]. It produces really nice looking reports with pie and bar charts and the level of detail can be customized to almost any need. It's also nice that it'll work on both web server logs as well as squid logs....

      Analog [analog.cx] may be the most poular, but I also found it rather difficult to set up and get useful data into and out of.

      Balam

  • by proxybyproxy ( 561395 ) on Tuesday April 23, 2002 @11:57AM (#3395113)
    I have tried webalizer and webtrends [webtrends.com], but without a doubt, nr. 1 is Urchin [urchin.com]. It really is the cream of the crop, but it costs too. You check out a sample here [urchin.com].

    If you get an account with Verio [verio.net], you will get your stats in Urchin for free.
    • I completely agree, urchin rules.

      When I worked for a .gone I was in charge of the stats analysis, and I looked into most of the major programs mentioned in this thread, and nothing comes close to urchin. As far as I see it, it only has two weaknesses:

      - cost
      - not open (there were a few features, I would have loved to add/alter)

      One really beautiful feature it has is that you can incorporate your sales stats with the program. I haven't tried it yet, but from what I remember, it allows you to directly check your sales stats/visitors ratio, where your purchasers came from, how long they stayed etc.

      • Did you ever try netgenesis [netgen.com]? It aint cheap, but it does a lot of ad hoc reporting rather than the static reports that things like webtrends (and what urchin appears to do - although I'm not really familiar with that)

        They also have an API that you can use to build custom functionality and/or match data against other systems (like a customer database)
    • Urchin is fast and awesome.

      But it doesn't have as much detail as other vendors like Webtrends. You can't really do campaign analysis.

      Review of these two [nwc.com]
  • WebTrends (Score:2, Insightful)

    by krangomatik ( 535373 )
    WebTrends [webtrends.com] offers software like this. We outsource a lot of our web stuff and one of our providers runs WebTrends and our people who like looking at pretty pictures really seem to like it. I have never installed or configured their software so I can't speak for ease of use, but the end user reports are easy to navigate. IIRC you can download a demo from their site and play with it. They do seem to have a demo report [webtrends.com] you can look at and see if this meets your needs.
    • Re:WebTrends (Score:5, Informative)

      by jslag ( 21657 ) on Tuesday April 23, 2002 @12:14PM (#3395239)

      I've used WebTrends for about a year, and couldn't be less impressed. Randomly chokes on logs that webalizer handles without trouble. Hard-to-use interface. Reports a number of things that you really can't tell from web logs.


      On the plus side, the PHBs love it.

      • I've used WebTrends for about a year, and couldn't be less impressed. > On the plus side, the PHBs love it.
        I have to pretty much aggree to both points here after using it a couple of years back now. It actually seemed to get worse with new versions. And it was pretty costly for what it acutally did.
        • Webtrends does the same as a lot of the free alternatives out there, but looks a lot prettier.

          Note: Many Mac users by default can not properly navigate through Webtrends Reports (Log Analyzer) due to a Java issue, which is an issue where I work where over half of the boxes are Mac.
      • I've been running WebTrends reports for the management types for about 2 years now, it's painfully slow, taking about an hour to generate the reports for our company intranet site. Just recently I set up with with WebTrendsLive [webtrendslive.com] and I have to say it's a big improvement. It generates almost the same reports with the same pretty pictures and graphs, but makes them all up on the fly. Which means less work for me, woohoo!
    • Re:WebTrends (Score:2, Interesting)

      by Anonymous Coward
      I used WebTrends in a large corportate environment in which we were getting like 1+ million hits per week. I can tell you that, at least at that time (1999), WebTrends was complete garbage. The numbers were way off (+/- 25% or more) from what the actual logs said. It actually told me once that I had 900,000 hits this week -- and seven million of them occurred on Saturday! In a support call, I actually got one of their engineers to admit that the product sucked and we couldn't count on the numbers to be accurate.

      I wrote some perl scripts and used the GD modules to simulate something close to WebTrends output until I came up with something better.

      Soon I found analog. The charts were not nearly as pretty as WebTrends, but the numbers were accurate and it ran about 15-20 times faster.

      Finally, I found the ReportMagic add on for analog, and I started creating accurate -- and attractive -- reports again.
  • It's not very difficult to implement one from scratch. On the pages you want to track, just call a function that sends the HTTP server variables (and other desired information) to a database. You can then use the IP address as an identifier and track a reader's history through the site. Trace the IPs, and you can get even more information. I've implemented a system on my site that basically tells me that folks from Los Angeles spend less time on a certain page than folks from Newark.
    • Re:Not difficult (Score:5, Informative)

      by fdragon ( 138768 ) on Tuesday April 23, 2002 @12:15PM (#3395249)
      But you cannot tie a particular IP address to a user. You have the problems of AOL users (each request from a different IP address) and corporations (and now many homes) using NAT or PAT devices to make 1 or more users have the same IP address.

      The best way to get around this is setting a session cookie via Apache. Then you key off that.
      • The best way to get around this is setting a session cookie via Apache. Then you key off that.

        Thats fine for any new logs, but you also want something that works with your old data. Even if its not as easy a solution, or requires a couple of different approaches. I don't think any manager type would be pleased with out a retrospective view. (And if they didnt ask for it, adding it anyway can only help next time you ask them for a pay rise.)

      • The best way to get around this is setting a session cookie via Apache. Then you key off that.

        Then you run into people like me who routinely deny cookies unless the site has a valid reason for issuing them. This has become easier than ever for the average user with IE6's cookie management.

  • W3Perl (Score:4, Informative)

    by Jon Peterson ( 1443 ) <jonNO@SPAMsnowdrift.org> on Tuesday April 23, 2002 @12:14PM (#3395243) Homepage
    Well, if you are looking for free stuff...

    I'd recommend W3Perl http://www.w3perl.com/softs/index.html which is a kind of mess of perl scripts, but is surprisingly fast (much faster than other perl-only stats packages), and is the most full featured free package I've ever come across.

    Set up is kind of a pain - it's rather complex, owing to the vast array of configurable thingies, but it works pretty well once it's put together.

    There are some genuinely innovative features, such as a tree view of your website weighted by the popularity of each branch from /index.html

    Worth a look if you are on a feature hunt. It requires some arcane image generation program to make the pretty graphs.

    Oh, and if you were hoping to explore the code - be aware that the guy who wrote it is French :-)
  • Sawmill (Score:2, Interesting)

    by esme ( 17526 )

    A couple of years ago, I did some research for webstats packages for our websites, and came up with a package that I haven't seen mentioned yet: Sawmill [sawmill.net] is the best tool for the kinds of questions you mentioned -- it can run as a CGI program (or as its own daemon) and does on-the-fly limiting, different reports, etc. So if they want to know what kind of browsers people were using in the Support section at 3am, they can get that.

    I put together a Perl CGI to handle combining logs from all of our different servers, and then feed the combined log to Sawmill (or FunnelWeb, the other package we wound up using).

    -Esme

  • Assumptions (Score:2, Interesting)

    by heikkile ( 111814 )
    From which link people exited the site etc

    Do not assume that people browse with just one browser window. I can not speak for others, but normally, when I leave a site, I close that browser where that site was. It is not often I follow a link out. If there are interesting links, I open them in new windows. It is not uncommon for me to have 16-32 windows open, often on 2-4 desktops.

    yes, I know there are tricks to discourage this sort of browsing. Those also doscourage me from visiting the sites, if I can find friendlier alternatives.

    • How does browsing like you described effect the "exit page" feature of these programs?

      I always figured they were using some kind of best-guess algorithm...ie. first page off session would be without a local referer, last page of session would be last page visited with a local referer since session start. Pulling the links over to another window, I'm pretty sure, sends the referer over. It might screw with the "visit path" features, but not with session time or exit page.

      Can anyone with more experience shed some light on how it is done?
  • Webalizer itself is very configurable but its default configuration leaves a lot to be desired. I maintain a list of search engines and sits that should be added to the configuration of Webalizer [ostermiller.org] to make it a lot more powerful. I also have a log sorting tool there to prevent webalizer from croaking on logs that are just a little bit out of order.

    Even if you don't find stats packages that do what you want, you can make webalizer a lot better.

  • One word: Excel (Score:2, Interesting)

    by YE ( 23647 )
    ...is the statistics tool which is probably sufficient for us 99% of the population outside the elite statistician circle.
    • If you have 700MB per day per server across 5 servers Excel falls down hard. WebTrends falls down hard. The last two companies I worked for both overcame these difficulties with web-bug pixel gifs in the pages to track user agent, referrers and such to a database where we could analyze it.

      The good thing about doing the raw logs is that they give a better idea of how much traffic has passed off the site. This is usually more use to smaller sites that have to pay by bandwidth used, or more specifically, more use to their providers.

      If you're just looking at tracking specific data, there's no easier way than to have all that data written to a database, you can have your webbugs tweaked to save off exactly what you want, you get all your data, no extraneous crap, and you can track whatever.

    • Yucko. Spreadsheets are fine for certain types of dataset, but a real pain in the ass in terms of useability all but the most rudimentry statistical analysis. Give me a Stata command line any day, even if just calculating simple means and standard deviations. For calculating statistics on subsets of data a spreadsheet is going to be an excersize in torture.
  • ModLogAn (Score:4, Interesting)

    by chrysalis ( 50680 ) on Tuesday April 23, 2002 @02:51PM (#3396479) Homepage
    ModLogAn [kneschke.de] is the successor of Webalizer.

    It produces similar reports, but it can works with a lot of servers, including FTP servers, firewalls, a bunch of web servers, realserver, shoutcast, squid, etc.

  • We use Sawmill [sawmill.net] where I work. It is very thorough, and it supports every different type of web log file format I can think of (Apache and IIS are supported, of course). Additionally, it is very thorough in it's reports, graphs, and statisitics with ways to customize such.

    It's not free, but it is very nice.

    Jeremy

  • phpOpenTracker [phpopentracker.de] does not rely on logfiles, but seems like addressing your needs.
  • If you don't mind paying for such a program, I would recommend Sawmill [sawmill.net] for this task.
  • FunnelWeb is quite good, you can even DL a demo.

    http://www.quest.com/funnel_web/analyzer/
  • At my former company, we used ILux [ilux.com]. It started as a simple log analyzer with a Java front end, and then evolved into a campaign-analysis/trip-through-the-site-tracking/c ustomize-email-marketing-according-to-their-path-t hrough-the-site behemoth. It costs, and (when I used it) it was cookie-based to enable the site tracking. We also didn't have much luck with it as it evolved, probably because we were using underpowered hardware (we really only wanted the log analyzer it started as) and it was a first release of the expanded product, but you may want to check into an eval.
  • WebTrends Log Analyzer is a great program and easy to find a crack on older versions.
  • Was looking into this. I work with a data-mining group and we were going to do a POP project for them. But then things fell through ... so I assume that they still went ahead with this project. ~N~

It is easier to write an incorrect program than understand a correct one.

Working...