Re: [Rd] CRAN Server download statistics (Was: R Usage Statistics)

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Mon, 23 Nov 2009 09:43:23 -0500

Knowing what percentage of different OSes are being used is of interest to package developers and would be obscured by the proposal to massage the data. I prefer to see the raw figure as is.

Also the number of IPs are important and should not be removed in my opinion since (1) it is a measure of clustering. If a package is mainly used by the courses of a few universities where the students really have no choice then that seems a lot different than if its used by a variety of people around the world. Only the IPs would give any clue to that. (2) it helps to diagnose intentional distortion of the figures by repeat downloads to the same machine.

The one problem with sparkline graphs is that it would take a lot longer for the page to load. There already is a time series if you click on the package name.

Suggest a link to the CRAN page of each package be provided.

On Mon, Nov 23, 2009 at 9:12 AM, hadley wickham <h.wickham_at_gmail.com> wrote:
> Hi Ian,
>
> I've spoken with Stefan Theussl (cran maintainer) about this, and he's
> concerned about the privacy implications of making the apache access
> logs public.  A compromise that he mentioned was having a script run
> on the cran mirror that processed the log files and output summary
> statistics.  Then a central process could aggregate these and produce
> a single overall summary.
>
> A few comments on your current site:
>
>  * Are you just including packages downloaded interactively from within R?
>
>  * I don't think the continent from which the package was download is
> of much interest.  There's definitely no need to include it on the
> main page.
>
>  * I'd be far more interested in changes over time.  Sparklines of the
> last month worth of data would be a neat addition to the main page.
>
>  * More vertical whitespace or subtle zebra striping would make it
> much easier to read across rows.
>
>  * I'm also not sure about displaying the number of unique IPs. R is
> used a lot in the university setting and until ipv6 comes along, many
> university downloads will appear to be coming from a single ip
> address.
>
>  * It's not very useful to sort by % Windows because the variance
> increases as the sample size decreases so the packages with the
> highest and lowest % windows are just the packages that aren't
> downloaded very often.  Maybe a shrunken estimate?
>
>  * Have you thought at all about how to take package dependences into account?

>
> Hadley
>
> On Sun, Nov 22, 2009 at 6:18 PM, Fellows, Ian <ifellows_at_ucsd.edu> wrote:
>> Hi All,
>>
>> It seems that the question of how may people use (or download) R, and it's packages is one that comes up on a fairly regular basis in a variety of forums (There was also recent thread on the subject on Stack Overflow). A couple of students at UCLA (including myself), wanted to address the issue, so we set up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and display some basic statistics. Right now, we have a working sketch of a site based on one week of observations.
>>
>>
http://neolab.stat.ucla.edu/cranstats/
>>
>> We would very much like to incorporate data from all CRAN mirrors, including cran.r-project.org. We would also like to set this up in a way that is minimally invasive for the site administrators. Internally, our administrator has set up a protected directory with the last couple days of cran activity. We then pull that down using curl.
>>
>> What would be the best and easiest way for the CRAN mirrors to share their data? Is the contact information for the administrators available anywhere?
>>
>>
>> Thank you,
>> Ian Fellows
>>
>>
>>
>> ________________________________________
>> From: r-devel-bounces_at_r-project.org [r-devel-bounces_at_r-project.org] On Behalf Of Steven McKinney [smckinney_at_bccrc.ca]
>> Sent: Thursday, November 19, 2009 2:21 PM
>> To: Kevin R. Coombes; r-devel_at_r-project.org
>> Subject: Re: [Rd] R Usage Statistics
>>
>> Hi Kevin,
>>
>> What a surprising comment from a reviewer for BMC Bioinformatics.
>>
>> I just did a PubMed search for "limma" and "aroma.affymetrix",
>> just two methods for which I use R software regularly.
>> "limma" yields 28 hits, several of which are published
>> in BMC Bioinformatics.  Bengtsson's aroma.affymetrix paper
>> "Estimation and assessment of raw copy numbers at the single locus level."
>> is already cited by 6 others.
>>
>> It almost seems too easy to work up lists of usage of R packages.
>>
>> Spotfire is an application built around S-Plus that has widespread use
>> in the biopharmaceutical industry at a minimum.  Vivek Ranadive's
>> TIBCO company just purchased Insightful, the S-Plus company.
>> (They bought Spotfire previously.)
>> Mr. Ranadive does not spend money on environments that are
>> not appropriate for deploying applications.
>>
>> You could easily cull a list of corporation names from the
>> various R email listservs as well.
>>
>> Press back with the reviewer.  Reviewers can learn new things
>> and will respond to arguments with good evidence behind them.
>> Good luck!
>>
>>
>> Steven McKinney
>>
>>
>> ________________________________________
>> From: r-devel-bounces_at_r-project.org [r-devel-bounces_at_r-project.org] On Behalf Of Kevin R. Coombes [krcoombes_at_mdacc.tmc.edu]
>> Sent: November 19, 2009 10:47 AM
>> To: r-devel_at_r-project.org
>> Subject: [Rd] R Usage Statistics
>>
>> Hi,
>>
>> I got the following comment from the reviewer of a paper (describing an
>> algorithm implemented in R) that I submitted to BMC Bioinformatics:
>>
>> "Finally, which useful for exploratory work and some prototyping,
>> neither R nor S-Plus are appropriate environments for deploying user
>> applications that would receive much use."
>>
>> I can certainly respond by pointing out that CRAN contains more than
>> 2000 packages and Bioconductor contains more than 350. However, does
>> anyone have statistics on how often R (and possibly some R packages) are
>> downloaded, or on how many people actually use R?
>>
>> Thanks,
>>    Kevin
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> http://had.co.nz/
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 23 Nov 2009 - 14:47:03 GMT

This archive was generated by hypermail 2.2.0 : Mon 23 Nov 2009 - 16:30:40 GMT