Re: [R] Popularity of R, SAS, SPSS, Stata...

From: Ted Harding <>
Date: Mon, 21 Jun 2010 02:01:03 +0100 (BST)

On 20-Jun-10 19:49:43, Hadley Wickham wrote:

>> I've given thought in the past to the question of estimating the R
>> user base, and came to the conclusion that it is impossible to get
>> an estimate of the number of users that one could trust (or even
>> put anything like a margin of error to).

> I find it hard to believe that it should be harder to estimate the
> number of whales than the number of R users. Sure there's a
> definitional problem of exactly what an R user is, but there must be
> some way to come up with some useful estimates. What about snowball
> sampling with R-help as an initial frame?
> Hadley

Whales are a different kettle of fish! They are much more directly observable, in principle, than are R-users. For one thing, a whale has to come to the surface to breathe every so often, and if you are in a ship nearby you can see it happen.

There have been many research ships out in the oceans in known whale areas looking out for just that, and planning their transects so as to be able to scale up their observed data into population estimates. In many cases individual whales can be recognised (by markings or by notches on the fins), enabling a kind of passive mark-recapture.

Also, active mark-recapture is carried out, with tags being planted into the animals and recovered later (though this was a sounder method prior to the moratorium commercial on commercial whaling). In addition, catch per unit effort (or observations per unit effot) data can be used to estimate abundance. Data have been available on Sex and Age. These days, responder beacons can be planted as tags, and their numbers within visually observed whale groups determined.

Data from such sources, and others, can be combined with analysis of population-dynamics models, thus improving the quality of the estimates.

R-users are not so easy to study! For one thing, they don't all come up to breathe, they can do that in the darkest depths and not be seen. Their population dynamics is obscure. The big problem with any sort of survey or "sample" of R users is that the target population is only partially visible, and seeking responses to any kind of survey is subject to non-reponse (including failure to target) bias from an intangible and therefore unknown number of users.

The idea of a "snowball sample" came up when this same topic was discussed back in 2000. Go to

and find the thread (and the various side-threads) which starts with a message

"[R] # of users of R, and biological examples of the use of R" from Ramon Diaz-Uriarte (Tue Jun 20 10:21:37 CEST 2000).

Searching that month (Jube 2000) of archives for the word "users" in the Subject will find them all (and nothing else).

The snowball was proposed by John Logsdon   "[R] # of users of R" (Wed Jun 21 11:59:34 CEST 2000)

John and I discussed the snowball idea at some length off-list, and that is when I came to the conclusion (for reasons such as the above) that although it had some mileage, and could provide information supplementary to other methods, the extent of its potential reach into the unkown was, well, unknowable ... [with acknowledgement to Donald Rumsfeld].

In reponse to the question from Bob Muenchen as to "How did you get the R-help figure?" (of email addresses subscribed to R-help), since I am one of the list moderators I can log in and access the subscriber's list.

As of today, the numbers are:

 4629 Non-digested Members of R-help
 5560 Digested Members of R-help
 (190 private members not shown)


(A few more than the number I picked up a some days ago).


E-Mail: (Ted Harding) <> Fax-to-email: +44 (0)870 094 0861
Date: 21-Jun-10                                       Time: 02:00:45
------------------------------ XFMail ------------------------------

