Re: [R] Efficient computation of trimmed stats?

From: Dimitris Rizopoulos <dimitris.rizopoulos_at_med.kuleuven.be>
Date: Tue, 15 May 2007 14:30:31 +0200

the following seems a bit better:

set.seed(1)
nc <- 30
nr <- 25000
x <- matrix(rnorm(nc*nr), ncol = nc)
g <- matrix(sample(1:3, nr*nc, rep = TRUE), ncol = nc)

#################################

trimmedMeanByGroup1 <- function(y, grp, trim=.05)

   tapply(y, factor(grp, levels=1:3), mean, trim=trim)

trimmedMeanByGroup2 <- function(y, grp, trim = .05){

   unlist(lapply(split(y, grp), mean, trim = trim)) }

out1 <- out2 <- matrix(0, nr, 3)
system.time(for(i in 1:nr) out1[i, ] <- trimmedMeanByGroup1(x[i, ], g[i, ]))
system.time(for(i in 1:nr) out2[i, ] <- trimmedMeanByGroup2(x[i, ], g[i, ]))

all.equal(out1, out2)

I hope it helps.

Best,
Dimitris



Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium

Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm

> Hi everyone,
>
> I was wondering if there is anything already implemented for
> efficient ("row-wise") computation of group-specific trimmed stats
> (mean and sd on the trimmed vector) on large matrices.
>
> For example:
>
> set.seed(1)
> nc = 300
> nr = 250000
> x = matrix(rnorm(nc*nr), ncol=nc)
> g = matrix(sample(1:3, nr*nc, rep=T), ncol=nc)
>
> trimmedMeanByGroup <- function(y, grp, trim=.05)
> tapply(y, factor(grp, levels=1:3), mean, trim=trim)
>
> sapply(1:10, function(i) trimmedMeanByGroup(x[i,], g[i,]))
>
> works fine... but:
>
> > system.time(sapply(1:nr, function(i) trimmedMeanByGroup(x[i,], g
> [i,])))
> user system elapsed
> 399.928 0.019 399.988
>
> does not look interesting for me.
>
> Maybe some package has some implementation of the above?
>
> Thank you very much,
> -b
>
> --
> Benilton Carvalho
> PhD Candidate
> Department of Biostatistics
> Bloomberg School of Public Health
> Johns Hopkins University
> bcarvalh_at_jhsph.edu
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 May 2007 - 12:37:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 May 2007 - 13:31:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.