Re: [R] memory and bootstrapping

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 05 May 2011 09:01:46 +0100 (BST)

The only reason the boot package will take more memory for 2000 replications than 10 is that it needs to store the results. That is not to say that on a 32-bit OS the fragmentation will not get worse, but that is unlikely to be a significant factor.

As for the methodology: 'boot' is support software for a book, so please consult it (and not secondary sources). From your brief description it looks to me as if you should be using studentized CIs.

130,000 cases is a lot, and running the experiment on a 1% sample may well show that asymptotic CIs are good enough.

On Thu, 5 May 2011, E Hofstadler wrote:

> hello,
>
> the following questions will without doubt reveal some fundamental
> ignorance, but hopefully you can still help me out.
>
> I'd like to bootstrap a coefficient gained on the basis of the
> coefficients in a logistic regression model (the mean differences in
> the predicted probabilities between two groups, where each predict()
> operation uses as the newdata-argument a dataframe of equal size as
> the original dataframe).I've got 130,000 rows and 7 columns in my
> dataframe. The glm-model uses all variables (as well as two 2-way
> interactions).
>
> System:
> - R-version: 2.12.2
> - OS: Windows XP Pro, 32-bit
> - 3.16Ghz intel dual core processor, 2.9GB RAM
>
> I'm using the boot package to arrive at the standard errors for this
> difference, but even with only 10 replications, this takes quite a
> long time: 216 seconds (perhaps this is partly also due to my
> inefficiently programmed function underlying the boot-call, I'm also
> looking into that).
>
> I wanted to try out calculating a bca-bootstrapped confidence
> interval, which as I understand requires a lot more replications than
> normal-theory intervals. Drawing on John Fox' Appendix to his "An R
> Companion to Applied Regression", I was thinking of trying out 2000
> replications -- but this will take several hours to compute on my
> system (which isn't in itself a major issue though).
>
> My Questions:
> - let's say I try bootstrapping with 2000 replications. Can I be
> certain that the memory available to R will be sufficient for this
> operation?
> - (this relates to statistics more generally): is it a good idea in
> your opinion to try bca-bootstrapping, or can it be assumed that a
> normal theory confidence interval will be a sufficiently good
> approximation (letting me get away with, say, 500 replications)?
>
>
> Best,
> Esther
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 05 May 2011 - 08:09:23 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 May 2011 - 05:20:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive