Re: [R] R usage survey

From: Greg Snow <>
Date: Fri, 04 Mar 2011 15:21:08 -0700

Thanks Hadley,

Your response here and some I received offline made me look back at what I said and others and I could have phrased things better. My phrase of "voluntary response" was a bit vague (but is what is used in the course materials I teach from, so that is what came to mind).

I specifically meant surveys without random selection where the respondents go to some effort to select themselves into the sample. I feel that this survey fits that category, though definitely not as bad as those where people have to call a phone number and pay a fee to respond. Hash's survey still looks like it is going to suffer from undercoverage and there could be serious bias from that. There are methods for adjusting for undercoverage, but I don't see how Hash will have the information needed to do those kinds of corrections (however I am still learning and would be interested if there is the type of info and methods available for his).

Also looking back, I mistakenly assumed that he was planning on doing inference, but I don't see anywhere in his posts that he stated that, so that was my fault and I came on a bit strong based on that mistake, I apologize for that. Offline he told me that he is planning on just doing descriptives and as long as he is up front about the limitations of the data and limits himself to descriptive, then the survey could be reasonable. Statements like "at least 3 people from region X use R" don't require probability samples (just assumptions that those 3 people were honest about being from region X), inference about how many others in region X use R would need more/better information.

Hopefully that clarifies my position and people can start liking me again,

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare

> -----Original Message-----
> From: [] On Behalf Of
> Hadley Wickham
> Sent: Friday, March 04, 2011 1:28 PM
> To: Greg Snow
> Cc: Harsh;
> Subject: Re: [R] R usage survey
> > Ok, I am very interested in what methods you plan to use that would
> be fit under the description "suitably analyzed" for voluntary response
> data.  From my training and experience the only suitable thing to do
> with voluntary response data is to put it through the shredder, into
> the recycle bin, or use as an example of what not to do in introductory
> textbooks.  Treating voluntary response data (especially given the
> responses to your post you have seen so far) as if it came from a
> proper random probability sample does not fit the idea of suitable
> analysis.
> Come on, that's a bit strong. In real life, it's not always possible
> to take a perfectly random sample and assume (at best) that missing
> responses are completely at random. Even descriptive analysis on a
> flawed sample is better than nothing at all. Of course you need to be
> extremely careful about making inferences about the wider population,
> but it's not true that the only thing you can do with survey data is
> to throw it in the trash.
> Hadley
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Received on Fri 04 Mar 2011 - 22:26:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 04 Mar 2011 - 23:30:19 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive