Re: [R] How can I optimize this piece of code ?

From: ONKELINX, Thierry <Thierry.ONKELINX_at_inbo.be>
Date: Mon, 07 Jul 2008 11:30:21 +0200

Try aggregate. It takes only 8 seconds for the 800000 rows in the example below.

m <- as.data.frame(matrix(rnorm(16000000), ncol = 20)) m$ID <- rbinom(nrow(m), 10, prob = 0.5)
system.time(

    aggregate(m[, 5:20], list(ID = m$ID), sd) )

   user system elapsed
   6.14 1.37 7.55
>

HTH, Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx_at_inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] Namens Daren Tan
Verzonden: maandag 7 juli 2008 11:18
Aan: r-help_at_stat.math.ethz.ch
Onderwerp: [R] How can I optimize this piece of code ?

Currently it needs 50+ mins to run on a 800000 rows. I need to run it hundreds of times :P

t(apply(unique_ids, 1, function(x) { sd(subset(m[, 5:20], m[,"ID"] == x)) } ))


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 Jul 2008 - 09:32:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Jul 2008 - 10:31:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive