Forgot your password?
typodupeerror
The Internet

Calculating Number of Users Based on Amount of Unique IPs? 25

Posted by Cliff
from the bean-counting dept.
pjdepasq asks: "I run a small but growing web site. Currently the site has optional registration (for the message boards), though we know we have a larger number of anonymous users. Is there an industry standard for calculating number of unique users based on the unique IP addresses over a period of time (1 week? 1 month?) We'd like to get a handle on the number of users we have. Sure, I know about dynamic IP addreses and ISPs like AOL which can dilute or confuse the numbers, but surely there's some benchmark calculation we can use."
This discussion has been archived. No new comments can be posted.

Calculating Number of Users Based on Amount of Unique IPs?

Comments Filter:
  • by Jamie Zawinski (775) <jwz@jwz.org> on Thursday May 10, 2001 @10:06AM (#233231) Homepage

    These problems are not really new to the web -- they are present just as much in print media too. For example, you only know how many magazines you've sold, not how many people have read them. In print media we have learnt to live with these issues, using the data which are available, and it would be better if we did on the web too, rather than making up spurious numbers."

    But I'm told by people in the magazine business that the industry standard there is to assume that the number of readers is 5x the number of issues sold. Of course that will vary widely by magazine: but that's the ratio they all use when making readership claims in their rate cards.

    This is exactly the question the original poster was asking, but for the web: everybody knows that getting an exact answer is impossible, he's just looking for a rule of thumb.

  • Q: "How do you map IPs to users?" A: "Use cookies!"

    They mean well, but they don't live in the real world. I on the other hand mean well AND live in the real world, so here are two reasonable ways to handle it. They both should give similar -- but not identical -- numbers. Either of which is good enough for anyone with reasonable expectations.

    1) Count the number of unique IP addresses you see every half hour. Simple, fast, easy. And reasonably accurate.

    2) A series of hits from a single IP can be considered a single user if there are no gaps more than five minutes between hits. Count up the number of these bunches of hits you see and you get the number of people. Hard and slow, but reasonably accurate.

    Neither of these will really give you the people that came to your site, but they definitly give you a good guess. They can't see stuff hidden behind proxies (but neither can anyone else) and they don't deal with IP addresses that change during a single session. But compare even THIS data to what TV advertisers get from Neilsen and you will NEVER feel like you need to drop a cookie on every one of your users.

    I use the first version to report traffic for a 4-million pageview a day website and it works just fine. And if your boss doesn't like it, beat some reasonable expectations into him or her.

  • you offer no reasoning whatsoever as to why your "method" is better than using cookies.

    That's because I wasn't trying to say IP counting is better than cookies. I was saying that for most sites, counting IPs is more than good enough. Cookie tracking and IP counting are both reasonably accurate.

    I'll gladly submit to better ideas, if only you can show me the flaws in my own arguments and convince me of yours.

    I wasn't trying to change anyone's mind. I was just trying to answer the original question. Cookies are a great way to track (and count) users, but they have nothing to do with the original question.

  • not the last-modified date, but rather an Expires: header will (hopefully) do the trick.

    Cache-control: private is probably the best solution, as it lets the browser cache the page but tells the proxy not to. Not sure if this always works or not, though.
    --
    // mlc, user 16290

  • You're much better off setting a cookie on the user's machine, and counting users that don't have cookies already.
    Of course, even this doesn't work when users refuse cookies, or clear them out periodically.
  • by Stephen (20676) on Thursday May 10, 2001 @01:13AM (#233236) Homepage
    I am almost certain that there is no industry standard.

    One problem is that it would depend very much on the type of website and thus the type of users you had. If you have a B2B website, and most of your visitors are from companies, your (unique user):(unique IP) ratio will look very different to a site with mostly home visitors coming through large ISPs.

    The industry seems to be more concerned with developing more and more reliable versions of the half-hour timeout metric. Of course, they're chasing the wind. (And furthermore, all the different versions of their metric are then not comparable -- see this study from Xerox PARC [xerox.com] (PDF, 228kb).)

    I leave you with this thought from my essay How the Web Works [analog.cx]:

    "These problems are not really new to the web -- they are present just as much in print media too. For example, you only know how many magazines you've sold, not how many people have read them. In print media we have learnt to live with these issues, using the data which are available, and it would be better if we did on the web too, rather than making up spurious numbers."
  • Your first statement was valid and made a lot of sense.

    THEN you said:

    "Second, there are man[y] servers which have a ton of virtual hosts on them, each with their own IP. A server could have 20 or 30 or more IP's assigned to it, there's no way to know. Furthermore, a server could have multiple NIC's, assiging different virtual hosts to different NIC's, making it even harder to figure out."

    How is this at all relevant to the question?

    He's trying to count visitors to his site, not sites on the web; and most visitors don't surf from their vhost accounts (most visitors don't HAVE vhost accounts to surf from.)

    To pick a nit, load balancers may move IP's on Vhosts around, but I don't believe having multiple NIC's would affect the IP's that a server sits on - they would remain static.

  • That's what cookies are for. If you know the problems with IP numbers, why try to use them for something that's clearly inappropriate and fraught with error?

    There are circumstances where this is impossible. Like, say, for last year's logs.

    I'll grant that this method is fraught with error and that using IP addresses to count noses is the work of the devil. Setting that aside, could a few folks who are running unique cookies on a large site count 'em and count IP addresses in the same period and give us their ratio?

    My rough rule of thumb is that the ratio is around 1:1 but it has been several years since I verified this.

  • > could a few folks who are running unique
    > cookies on a large site count 'em and count IP
    > addresses in the same period and give us their
    > ratio?

    Certainly. We were counting users by IP address/browser type combination for about a year on three sites getting between 5,000 and 50,000 users per month based on that calculation. We then decided to use cookies (expiring at 5 years). We saw an increase in user numbers of about 20% for each site.

    Which was nice, seeing as we then went for ABCe audits and got lots of advertising money!

    See - cookies DO work!

    G

  • > In today's world, the reality is that very few users don't accept cookies.

    Exactly. All the people posting here going "cookies don't work because people turn them off" are on planet Slashdot.

    HOWEVER - one thing that's not been mentioned yet is that if you use mod_usertrack to cookie your logs and you get a user who does not accept a cookie, it creates a stream of unique IDs - one for every request that user makes.

    So - those people who turn their cookies off and go to Apache servers could be looking like hundreds or even thousands of users! Hooray!

    G

  • by Cire (96846)
    You can't do it.

    First, there are too many corporations using NAT, and it's impossible to know how many people are NAT'ed. A company may have 100 employees, but only have 3 static IPs.

    Second, there are man servers which have a ton of virtual hosts on them, each with their own IP. A server could have 20 or 30 or more IP's assigned to it, there's no way to know. Furthermore, a server could have multiple NIC's, assiging different virtual hosts to different NIC's, making it even harder to figure out.

    -Cire
  • plus [AOL's] cacheing means some users will never make it to your page

    You can beat the caching by placing a Pragma: no-cache in the http response and/or set the last modified date to now.

  • You Will Never Know How Many Users You had.

    IP based ratios won't work because AOL will fuck you over. I've seen the same users come from different IP's in their proxy space, plus their cacheing means some users will never make it to your page.

    Some users have plug ins that request the page from a second IP (NBCI's quick click anyone?) that will skew your numbers. Exact numbers will not happen, and ratios will vary widely based on your clientele.

    Set cookies and go on your way.

  • I didn't say I was using them, I was wondering if there was some industry calculation based upon the IPs.

    I've not dealt with cookies before and had not considered using them.
  • Good luck getting your rough estimate based on IP addresses. I think it's great that you're eschewing cookies; go for cache friendliness. [webtechniques.com]
  • me:
    redirecting from one page to another where the user must have a cookie if (sorry, forgot this word) it was accepted
    You don't have to be as obtrusive as that, forcing the user to accept a cookie.
    My point wasn't to force someone to accept a cookie, simply checking if he has accepted one, and continue processing with or without the cookie.
    The process would be something like this (same as your suggestion):
    Have some page (/ ?) set a cookie and automatically redirect to a "test" page. This page simply verifies if a cookies is returned by the user. If not, he is rejecting them.
  • You can always check if the user has accepted the cookie or not: redirecting from one page to another where the user must have a cookie it was accepted. You can then estimate the proportion of non-cookie users.
    And if you base your stats on a short period of time, say 2 or 3 weeks, users clearing cookies will be a minority.
  • You're much better off setting a cookie on the user's machine, and counting users that don't have cookies already.

    IP's are misleading... every user behind a proxy server shows up as the same IP address. There could be thousands of users behind the same proxy.

    MadCow.

  • >> redirecting from one page to another where the user must have a cookie it was accepted

    You don't have to be as obtrusive as that, forcing the user to accept a cookie. I'd simply have a small (i.e. 1x1 pixel) frame on the page that requested/set the cookie, and checked if it was accepted. From this data, you could extrapolate:

    • The number of unique users that accept cookies
    • The number of visits per unique user that accepts cookies
    • The number of visits by users that don't accept cookies
    • extrapolate the number of unique users that don't accept cookies based on average visits/user

    In today's world, the reality is that very few users don't accept cookies. They're convenient. They're enabled by default in most circumstances. Most people probably don't know they exist. I'm talking about "Joe User", not "Joe Slashdot".

    I'd hazard a guess that the above would provide a 98%+ accurate indication of your user base, as well as additional info such as visits/user and user frequency.

    MadCow.

  • >> They mean well, but they don't live in the real world. I on the other hand mean well AND live in the real world, so here are two reasonable ways to handle it.

    I have no problem with people refuting my claims/advice, but you offer no reasoning whatsoever as to why your "method" is better than using cookies. I certainly DO live in the "real world", and the methods suggested using cookies would definately provide simple, accurate, and objective measurements of unique users.

    Do you not have experience using cookies? Do you not know what they do or how they work? Do you understand the problems with using IP addressing, as indicated in the cookie discussions above? Do you have any actual "substance" with which to argue those points?

    I'll gladly submit to better ideas, if only you can show me the flaws in my own arguments and convince me of yours. Unfortunately, your post lacked any (possibly quite correct) details to support your claims.

    MadCow.

  • The Cache-Control header was invented in the HTTP/1.1 spec (RFCs 2068 [rfc-editor.org], 2616 [rfc-editor.org]) as a response to the prior invention and use of web caches. Sort of like the XHTML standards and validation was created in response to the mess of HTML that webbrowsers allow and WYSIWYG editors create.

    Thus, there are as many non-HTTP-compliant web caches as their are non-HTTP/HTML-compliant browsers.

  • That's what cookies are for. If you know the problems with IP numbers, why try to use them for something that's clearly inappropriate and fraught with error?
  • There are too many confounding factors that can mess with your conclusions if you use IP numbers.

    Depending on your target audience, the bias can correlate with geography (people in areas without vibrant independent ISP culture are more likely to use services like AOL that run users through proxies, and some countries - such as New Zealand - have almost everyone behind them), it can correlate with specific large institutions (if half your market is Time-Life, you'll only get one unique IP in the logs), it can correlate with almost anything.

    Cookies are easy and painless and the small number of crackpots who are afraid of them are more likely to cut evenly across various demographics than are the entanglements created by assumptions about IP-human correspondence.

  • What? I have to store crap on my computer because you want to know if I've been there before? What do I get out of it?

    No, you don't have to, but the cost to you is nil in terms of time and disk space and any other resources, so in the general case there's no particular reason not to unless you get intrinsic reward from being a curmudgeon.

    The question says "we know we have a larger number of anonymous users" [than people who register], a majority of the people who turn off cookies are going to be in this group.

    But you can count the number of cookie-refusers and account for them by extrapolation. This way the confounding effects of IP-based tracking only affect the minority of your users who won't accept your ginger snaps.

  • What? I have to store crap on my computer because you want to know if I've been there before? What do I get out of it?

    The question says "we know we have a larger number of anonymous users" [than people who register], a majority of the people who turn off cookies are going to be in this group. And there's no incentive to keep the cookie even if you got people to accept it.

    However, there is an easy way to associate IP address with "real human". Put the IP address in the cookie. Everyone loves that.

The Tao doesn't take sides; it gives birth to both wins and losses. The Guru doesn't take sides; she welcomes both hackers and lusers.

Working...