Forgot your password?
typodupeerror
News

Webtrends - Reporting Site Usage and Other Stats? 28

Posted by Cliff
from the how-well-does-it-work dept.
gammoth asks: "My company has a successful web site which gets roughly 1,800,000 hits from 45,000 sessions a day. A few years ago, our web stats software, HitList, broke when we crossed it's capacity threshold (~1,000,000 hits). I replaced it with a tailored version of Webalizer supported by an array of perl scripts and a Suitespot server plugin. My reporting system runs with little intervention, managing log files from 4 hosts, and competently reports on hits, popular pages, referrers, etc. But it's not perfect and I'm the first to admit it doesn't provide the kind of info the marketing department would find really useful. I have plans of a comprehensive system using a DB and a report engine, but I've not had the time to implement it. (We're interested in info on marketing campaign success, path through site, etc). Meanwhile, marketing is tired of waiting and the otherwise exceptionally supportive IT management (truly) is considering contracting out some of our site usage reporting. Webtrends is being looked at seriously. I was wondering if any readers out there had had any experience with Webtrends or other software package or service provider. Are there any OS packages that provide features well beyond Webalizer?"
This discussion has been archived. No new comments can be posted.

Webtrends - Reporting Site Usage and Other Stats?

Comments Filter:
  • http://urchin.com/ - seriously, we've used it for a while now, and it looks great and can report on just about anything.
  • "supported by an array of perl scripts... doesn't provide the kind of info the marketing department would find really useful"

    I think that a LART, applied tactfully, is in order. Obviously, the marketing department needs a crash course in the elegance of Perl. =)
  • by judd (3212) on Sunday June 30, 2002 @03:48PM (#3796580) Homepage
    I much prefer less flashy but more capable tools such as Analog (http://www.analog.cx).

    WebTrends annoys me greatly, because it is poorly documented, has a sucky interface, and misleads naive users into thinking they are getting reports on "visitors" and "sessions" when in fact they are simply getting stats on a window of visits from an IP number.

    Read this document Why web usage statistics are worse than meaningless [goldmark.org] and memorise it.

    Also, remind your marketing folks that quantitative data from your logfiles can only be interpreted with qualitative data from interviews/focus groups/usability studies. If people stay for less time in your site tan before, is it because your design sucks, or because they found what they wanted and left quickly? Only qualitative research can tell you.

    Whenever marketing people spot trend variations, they will ask you why. You will need to know the above in order to respond properly.

    • NOT meaningless (Score:4, Interesting)

      by legLess (127550) on Sunday June 30, 2002 @05:10PM (#3796835) Journal
      I've read that document before, and I suggest that perhaps you need to re-read it with a more jaundiced eye towards your prejudices.

      The document now contains several disclaimers admitting that the author's original conclusions have been undermined somewhat by his own hyperbole, ignorance and by new technology (the original was written in 1995 - in web terms, it may as well be written in hieroglyphics on decaying papyrus)(ok, so that's a little exageration of my own... :P ). It's still worth reading, but only after you filter it a little.

      In particular, he doesn't account for cookies, which are great for web tracking (personally, I block nearly all cookies, but I don't think that session tracking is a malicious use). Cookies can give you very accurate data on visitor use, and proper reporting can turn that into very useful information.

      Also, the points he (or she) and you make about IP addresses vs. sessions vs. users are valid, but overblown. Very few people access the same site from different IP address in a given session. You wouldn't want to bet your life savings on these numbers, but they're accurate over 90% of the time, and that's more than enough to get good information (as someone else once said, "Don't believe me? Next time you have a blood test, tell them to take it all to make sure they get an accurate reading.").

      We've used WebTrends for month, and I like them quite a lot. For some things they are excellent; for others, not so. A word about methodology: WebTrends tracking code consists of a primary method and a fallback. The primary method uses JavaScript to compute a compressed string of data including much client information and appends this to an HTML image tag - this data is slurped into a database at WebTrends. If JavaScript is disabled, the hit still gets recorded, but without all the fancy extra info. They try to place a unique, persistent cookie with each image load (once per page).

      According to WebTrends, over 95% of our visitors have both cookies and JavaScript enabled.

      Their reporting tools are very good and comprehensive, containing everything I've seen from the best log analysis software and some things that software can't get (average screen resolution and window size, for instance - I love this). You can customize content groups to your heart's content by modifying some variables in their JS. Their site itself is well made and smart: their help system pops up a content-sensitive window with information for each specific page; if you click to a new page, the help window is updated. Yes, this is relatively easy to implement, but how many sites do it? Too few.

      Now, not all is Madam George and roses (to coin a phrase). I've found that WebTrends reports at best 95% of our traffic. Periodically I run a couple home-brew Perl scripts on our logs and it always counts more hits than WebTrends shows (not an issue with my Perl-fu, BTW). Their tech support is decent, but not wonderful - if you have a real issue, you might run around a little. A couple times they've flat-out dropped large chunks of our traffic (e.g. 40% for a day), never to be seen again.

      Finally, we get about 10% the traffic the original poster does, so I can't tell you how well they scale. They'll charge a pretty penny for that amount of traffic, too.

      To summarize (whew): (a) WebTrends is pretty decent, and excellent for some things; (b) IP-based assumptions and cookie tracking can get you very accurate statistics as long as you can live with the limitations.
      • Re:NOT meaningless (Score:3, Insightful)

        by judd (3212)
        The author of the article I linked did say it was a rant.

        Cookies only work their magic if you have full control of the hosting environment; ie if you can set a unique cookie in the first place, and record it in the logs. (Yes, I know, you ought to be able to do this everywhere, but it's not a perfect world). In their absence, I don't know how you measure the effect of proxies in conflating IP numbers.

        I think there's WebTrends, and WebTrends. The versions that I have seen (WebTrends Enterprise) did not operate in the manner you described - it was a product that you ran on your own boxes that did pure log file analysis. It did not have a server component that could "tag" a page, and WT themselves were not involved. The feature you describe is very neat - although whether it justifies WT's very high price tag is a good question.

        I still feel that if there is a particular metric that is important to you, you are better off coding it yourself or using Analog.
        • Yes. At my last job, we used the self-run version of WebTrends. There were good things about it, and there were things that weren't so good.

          Personally, as the monkey that had to run it every week, I didn't like it very much -- the version we had could only do log analysis on NT, but all our servers were Linux, so once a week I had to download the weekly reports to my computer (which meant that I wasn't able to run Linux even though all the work I did other than WT had to run on Linux; the download wasn't *that* bad, we were a small site and all the logs would "only" amount to a couple hundred megs -- usually it would finish by the time lunch was over on Mondays... :/), the actual analysis would take quite a long time to complete, and at the end of it I'd have to upload the summary reports back to the web site so that everyone that needed access would be able to see it.

          Now to be fair, counter to what I just wrote, the catch is that the copy of Web Trends we had was old when I started the job, and it's even older now -- I'm thinking it's at least three or four years old now. Presumably there are better versions of it now that can run on the same platform as the server OS. Better still, I could have saved a lot of bother by just leaving an NT box at the co-lo facility and automatically access Apache log files by the FTP or SAMBA as needed. But I didn't think of that then, and the product documentation certainly didn't make such a useful suggestion.

          I think the earlier poster hit the nail on the head with his comment about quantitative (log based data) vs. qualitative (questionnaire / survey / observation type data). You can whine all you want about how web gathered data is imperfect as a data collection for marketing analysis, but hey, just look at what *every other* form of advertising has to offer: surveys only. Hey, the web can do surveys too, and the web can also gather all this technical information that aren't available to TV, radio, or print ads. The issue isn't that this web data is imperfect -- so what? -- the issue is that you at least have something to work with. As long as you keep the imperfections in the back of your mind when analyzing that data, you can still draw conclusions that are at least as solid as those gatherable by any other form of media.

          Also, playing into another point raised by that same poster, I have heard of companies hitting the same problems with internal web data numbers being consistently lower than those obtained by third parties. There are a lot of reasons that this will happen, and not all of them are ones you want to try to defeat -- caching for example allows many people to see your content without running up your bandwidth costs, but the tradeoff is that you never know for sure how many people got to see that content. The best you can hope for is consistency -- to always have numbers that are 1.2x that of the third party counts, or 1.6x, or 3x or if you're lucky 0.75x. Whatever. The point being, getting the numbers to agree is difficult because everyone has a different counting strategy; as long as you can account for the differences & accurately preduct how the third party numbers will agree (or disagree) with yours, that's enough to work with.

  • NetMining [netmining.com] is located in our office building. They might have some products that interest you and/or your marketing department.
    Sorry for the shameless plug...
  • by lunenburg (37393) on Sunday June 30, 2002 @03:50PM (#3796586) Homepage
    It's closed-source, commerical software, but I've been a big fan of NetTracker [sane.com] from Sane Solutions [sane.com] for a few years now.

    I use it in an ISP environment, running with Apache logs on FreeBSD, and haven't had a problem with it yet. Plus, their support is outstanding.

    It's one of the few pieces of closed-source software I have recommended. They have a demo version, so you can try it out on your logfiles and see if it works for you. But I highly recommend it.

    Disclaimer: I have no relationship with Sane aside from being a happy customer
    • Same here.

      We've been using NetTracker for over 3 years now and our customers love it.

      Great support and it runs on most platforms. We use it on Solaris now but ran it under Linux for several years with no problems.
  • Rings a bell... (Score:2, Informative)

    by twodiddyliddy (574077)
    This story looks a lot like this ask slashdot [slashdot.org].
  • WebTrends sucks (Score:4, Informative)

    by austad (22163) on Sunday June 30, 2002 @04:43PM (#3796754) Homepage
    I used WebTrends for several sites with about the same traffic you're looking at analyzing. In short, WebTrends sucks bigtime. It would crash for no reason almost everyday, and their dns resolver code is sloooooooow. I had to write a custom dns resolver that would replace all of the ip's with the hostnames in the logfiles before running it through webtrends. I've used both the Windows version, and the Linux Webtrends server. The windows version actually worked better, but it still sucked bigtime. Their customer support sucks too. A new version came out a week after I spent $2000 on their software, which was filled with bugs. The new version fixed most of the bugs, but they were going to make me buy it again to get the upgrade. Analog with Report Magic [reportmagic.com] did the same things webtrends did, but it was free, and it worked much better.

    Another package I've used is Accrue [accrue.com]. I think this is by the same people that make HitList, but it's much better. It's not without it's problems, but it would work great for a site with the amount of traffic that you are analyzing. We didn't run into problems until trying to analyze more than 150 million hits/day. It has a sniffer that sits on your network and watches web traffic. It generates it's own logs which are more comprehensive than your webserver logs. Every hour, it uploads it's data to the "warehouse" box which analyzes it at the end of the day. It requires beefy hardware, big expensive Sun Enterprise systems. It has some nice marketing stats stuff, like path analysis, and other crap. Very expensive though, expect to spend 5 to 6 figures on the software, and another 5 to 6 figures on hardware. They purchased another company that did nearly the same thing about a year ago, and they have a new version based on the technology from the other company, version 6.0/6.1. I haven't used the new version, but supposedly it's much better. The price is still insane though, so unless this is something you really really need, I'd stay away. It also requires a good DBA who knows RedBrick or Oracle (you can use either for a database).

    Another option is a managed log service like Digimine [digimine.com]. They work well, but it's a recurring fee since it's a service, not software. And you have to upload your logs to them every day.

    There's a company that's been hitting me up lately, I forget their name now. But they have a linux based version which has clustering capability. The database is stored compressed in chunks across the entire cluster. It scales linearly, so you can add machines as you need them. They've been taking business away from Digimine and Accrue. They are based in Minneapolis I think, but like I said, I forget their name now. Their software can correlate different logs together too, and get you stats on email campaign's, video streaming, and your webservers. If you're into spending money, this would likely be your best bet.

    I would stay far far away from WebTrends if I were you. Webtrends is a sucky product, and you can get the same info with Analog and ReportMagic, for free, and with better performance. 1.8 million hits isn't really that much, so a product like Accrue would likely be overkill. And most companies balk at services since they can't depreciate the expenditure over time, it's an operating cost not a capital expense.
    • Re:WebTrends sucks (Score:4, Informative)

      by Wanker (17907) on Sunday June 30, 2002 @05:51PM (#3796973)
      My biggest gripe with WebTrends is how they try to "dumb it down" so that any bozo who can spell HTML can use it. This in itself is not all bad, but there is absolutely no faciltiy to have it reveal how it arrived at the numbers it did.

      You have to have blind faith in the product.

      Try feeding WebTrends a custom log that isn't in its predefined types. It will not error out, it will not complain a bit, it just parses the log incorrectly and produces completely meaningless output.

      How can you tell this completely meaningless garbage output from a properly parsed logfile?

      You can't.
  • Analog is great, and free. I think it's from Cambridge University.
  • The first thing you need to decide is whether you want to use a hosted service where the service keeps track of your hits on their or a local app that runs on your server and analyzes your logs. I think that Webtrends has both versions, hence the very contradictory comments so far.

    I have used the Webtrends hosted service, WebTrendsLive, and have no complaints.

    • It's easy to implement--just insert some javascript into each page you want to track and set a few variables to customize it.
    • As a hosted service, they keep track of all data and crunch the numbers, so there is no extra load on your servers.
    • The web interface is nice and provides all the info marketing wants and more.You can set up the service to email reports to your marketeers as often as they want.
    • However, with your level of hits, it will probably cost you big bucks.

    Applications that run locally are much less expensive, but they put a bigger load on your servers (I don't have a lot of experience with them, though).

  • PowerPhlogger: http://www.phpee.com
    AwStats: http://awstats.sourceforge.net
    AXS: http://www.xav.com/scripts/axs
  • Howdy,

    You might like to try Funnel Web Analyzer Standard (free), which we pit against WebTrends's standard Log Analyzer. We used to sell this for $399, but it's now free.

    We have an Enterprise version that delves deeper into stats with Clickstreams, etc, but the free version might be sufficient for you:

    http://www.funnelwebcentral.com

    Cheers,
    Suren
  • If you want to be able to analyze weblogs without having to worry about cookies and such, there really are only three options that I considered. I was hired by a medium sized company as a 'Summer Associate' (read: Intern) to find a tool that will help analyze our logs. The criteria was that it could monitor the number of clicks on a particular add or link, so that marketing could track how successful the banners on the front page were.

    The requirement from the IT department was that it had to be able to do a two-pass analysis. The first pass to read all the raw data into a raw database, and the second one to filter through all of that (IE, hits from within the company and from search bots were discarded) and to generate the reports. The reason for that was that we didn't have room on our servers to store 20 megs of log files a day, and if we suddenly discovered that a certain IP address that had been registering all kinds of hits was actually a searchbot, we'd want to be able to rebuild the database without having to go back to the origional log files. At any rate, I spent a solid week on nothing but this, and here is what I found:

    1) Webtrends - We already use this one. We don't like it as much because it doesn't track the clicks through the JSP post commands as well as we would like it to. If your company uses HTML pages, then it has a great ability to track users through your site. like what percentage of people who were on the main page clicked on this link, etc. etc. It only uses a one-pass database, so whenever we discover that a certain IP is a searchbot or we need to put on some other filter, we ahve to have someone go through hordes of data and clean it up a bit. It also has a web interface, so you can just dedicate an NT box (Mod: -1, Suggested Using Microsoft) to hosting the server and analyzing the data, and not have to dedicate anything else to it.

    2) Nettracker by sane solutions: [sane.com] This is the best that I was able to find. It also has the web interface, and I was able to run the MySQL server, the nettracker server, and the web browser. It has a one-pass system also, but because it uses a simpler database structure than webtrends, it's easier to maintain the data. You can either use an oracle database, an SQL database, or it's own internal database. It also has the ability to track users through your website. It can export the reports through Microsoft Word or Excel (marketing people love that). It also has the ability to create custom reports easily, so that we don't have to custom make them for the marketing people.

    3) The last one is sawmill [sawmill.net]. This has all the basic features that nettracker had, but can only use its own database, and as far as i could tell couldn't export the graphs. I will say, though, that it costs several orders of magnitue less than nettracker or the full version of webtrends does.

    this is my analysis of web traffic analysis tools. Most of it is more than a month old, and comes from the demos that I could get at the time. If you think that I got something wrong, please post. Hope this helps a bit

  • Had to use it on 2 customers sites running Lotus Domino and IIS. IIS reaults were barely reasonable, however if you have a Domino site, avoid Webtrends like the plague.

    A year after implementation, and we STILL cant get reports out of it. I think the biggest problem so far is that Webtrends dosent use MIME-types for determining the file type, it uses the .ext filenmae extension. This is really lame, and a fatal flaw for Lotus Notes based content as notes databases can contain anything. Needless to say, its also standards non-compliant.

    As the poor tech who had to live with this crud, I advise you to please staythe hell away. Atlease if youre in a Notes/Domino environment.
  • http://ask.slashdot.org/article.pl?sid=02/04/23/02 47226&mode=thread&tid=106

The clearest way into the Universe is through a forest wilderness. -- John Muir

Working...