Re: [R] loop over large dataset

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 05 Jul 2005 - 00:15:27 EST

Federico Calboli <f.calboli@imperial.ac.uk> writes:

> > behaviour, e.g. because gc() is called more frequently. And of
> > course, gc() needs some time, hence you get the expected increase
> > in runtime. This answers you other question as well.
>
> Is then internal gc() calls that increase the runtime from 5 minutes
> to more then 24 hours for a 27x increase in data (given that the code
> is exactely the same)?

Your original code got lost in the threading, but that order of magnitude suggests that you have N^2/2 behaviour somewhere. The typical culprit is code like

x <- numeric(0)
for (i in 1:N){
  newx <- <<....>>
  x <- c(x, newx)
}

in which the extension of x causes the whole thing to be reallocated and copied. Same thing with cbind and rbind constructs of course.

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jul 05 00:18:47 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST