Re: [R] Popularity of R, SAS, SPSS, Stata...

From: Ted Harding <>
Date: Sun, 20 Jun 2010 20:41:46 +0100 (BST)

On 20-Jun-10 19:07:21, Muenchen, Robert A (Bob) wrote:
>>I wonder if there are any capture-recapture type methodologies for
>>estimating open-source software usage? Another idea would be to
>>combine with some other known numbers, e.g. book sales, conference
>>attendance etc. You'd need personal information to link the data sets

> This totally cracked me up! I'm envisioning going into one of our
> computer labs, tossing a net over an unsuspecting student, and then
> tagging their ear with a code that represents which stat package
> they're using. Then release and later recapture. What percent did
> we get? That's what the profs I deal with do with animals to estimate
> populations.

I've given thought in the past to the question of estimating the R user base, and came to the conclusion that it is impossible to get an estimate of the number of users that one could trust (or even put anything like a margin of error to).

I think one could get a number which represented a moderately informative lower bound -- just count the number of different email addresses that have ever posted to the R-help list. This will of course include people who post (or have posted) from more than one email address, and people who tried R for a while and then dropped it, but my feeling is that these are likely to be outweighed by the number of people who have used R but have never posted (for example students who are getting their R help from their instructors, people using R in a corporate context who are discouraged from posting to public lists, etc.).

The number of subscribers to R-help (currently about 10200) is a definite lower bound for the number of R users, but many users post to R-help without being subscribed.

I would expect that the total number of different email addresses that have posted to R-help would be considerably larger than 10200.

I don't think a "Mark-Recapture" approach is feasible.

Further, I don't know how one might take account of the fact that some installations of R (e.g. on a corporate or institutional or departmental server) may each be used by several users.


E-Mail: (Ted Harding) <> Fax-to-email: +44 (0)870 094 0861
Date: 20-Jun-10                                       Time: 20:41:43
------------------------------ XFMail ------------------------------

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sun 20 Jun 2010 - 19:45:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 20 Jun 2010 - 22:40:32 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive