Re: [R] sample size estimation for count (poisson?) data?

From: David Winsemius <>
Date: Thu, 13 Nov 2008 23:17:06 -0500

The notion that you can just add or subtract 0.03 from estimate is obviously incorrect.

Presuming you meant to call you lower bound q05 and the upper bound q95, the numbers I get are in your 10,000 iteration loop are 4.97 and 5.18 (around a mean of 5.08). So roughly a .1 swing on each side of the mean or 2% "margin of error" assuming you mean a 90% confidence limit. That would be a reasonable 90% CI for an estimate under the assumption that it is Poisson which would require some checking ... at a minimum the variance should be near the mean (as it obviously would have been in you simulation. Traditionally this sort of estimate would be a 95% CI and this simulation estimate for that would be

 > q.025=quantile(d, 0.025)
 > q.975=quantile(d,0.975)
 > q.025; q.975


  Which is more like the 3% that you were initially talking about.

But I would have thought that it would be more appropriate to make new samples rather than to draw from the same relative small sample and the code I would substitute is
 > for (i in 1:10000) {

+ samp = rpois(sample.size,lambda = 5)
+ d[i] = mean(samp)
+ }

 > q.025=quantile(d, 0.025)

 > q.975=quantile(d,0.975)
 > q.025; q.975



So a 6-7% swing on either side with that size sample. I would think that 5 would be a fairly meager observation count. I would ask the questions:
- where did the number 5 come from? (the variance of a Poisson variable is set when you know the mean, since it is a one parameter distribution.)
- the notion of "margin of error" is getting mixed up with 95% confidence interval. Which one do you really want? Do you want the standard error of the mean to be be less than a specific amount or to be a specific fraction f the estimate?
- have you considered that there may be extra-Poisson variation due to heterogeneity. Some section of the hiker population may be more observant. In which case the variance of the sample will exceed the mean.

David Winsemius

On Nov 13, 2008, at 4:43 PM, Shawn Morrison wrote:

> Thanks. I did the search before I posted and found those threads.
> However, it does not seem to do what I want. All I want to do is
> estimate the sample size for a point estimate, not do a GLM. I just
> want the mean within a margin of error, and to a given CI.
> I've tried writing some code to do a simulation (below). Will this
> do the job?
> #Generate data from Poission distribution, with lambda = 5
> data = rpois(200, lambda = 5)
> mean(data); var(data)
> #Parameter Estimates
> moe = 0.03 # margin of error = +/- 3%
> sample.size = 168 # number of hunters to sample
> #Draw sample size from population, calc mean. Run 10,000 iterations
> d = numeric(10000)
> for (i in 1:10000) {
> samp = (sample(data, sample.size, replace = FALSE))
> d[i] = mean(samp)
> }
> #What are the bounds on the values that correspond to the margin of
> error?
> lower=mean(data)-moe
> upper=mean(data)+moe
> #values from 'd' based on 90% confidence intervals
> q25=quantile(d, 0.05)
> q95=quantile(d,0.95)
> #top row = bounds on the mean from the margin of error, second row =
> bounds based on simulated data and sample size, third row = 1 =
> true, 0 = false in terms of the sample size being adequate to meet
> requirements of the margin of error.
> output=rbind(cbind(lower,upper), cbind(q25,q95), cbind(q25>lower,
> q95<upper))
> row.names(output) = c("known", "estimated","True/False")
> output
> On 12-Nov-08, at 4:41 PM, David Winsemius wrote:
>> The first hit for search on "sample size" and "poisson" on Baron's
>> search engine web interface appears on target:
>> Getting the same result from your console window requires a couple
>> of extra back-slashes:
>> > RSiteSearch(""sample size" poisson")
>> Error: syntax error
>> > RSiteSearch("\"sample size\" poisson")
>> A search query has been submitted to
>> The results page should open in your browser shortly.
>> --
>> David Winsemius
>> Heritage Labs
>> On Nov 12, 2008, at 2:46 PM, Shawn Morrison wrote:
>>> Is there a function in R that will allow me to estimate the sample
>>> size required from count data (poisson data?), given the known
>>> variance and desired margin of error and confidence interval?
>>> My specific data set will be based on a survey of hikers that will
>>> be
>>> asked about the number of animals of species 'x' they observed
>>> during
>>> a given period. I need to know the number of hikers to interview.
>>> ie,
>>> I would like to calculate the mean number of species 'x' +/-
>>> margin of
>>> error with 95% confidence.
>>> This is a simple exercise for normally distributed continuous data,
>>> but I'm running into roadblocks for count data.
>>> Sincerely,
>>> Shawn Morrison
>>> [[alternative HTML version deleted]]
>>> ______________________________________________
>>> mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]] ______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Received on Fri 14 Nov 2008 - 04:24:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Nov 2008 - 04:30:26 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive