Re: [R] how to get the group mean deviation data ?

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon 25 Jul 2005 - 16:57:47 EST

> if n id quite large,say n=1000 and t=3, it require too much time.so i
> want to know any more efficient way to do it?

Why is about 0.4 second (which is what it takes on my system) too long?

Given that you want to operate on 3000 cells, a second does not look unreasonable.

This is a toy problem, and it is unclear what the real problem is (if any). Since you have the same number of replications for each cell (group-variable combination), I would use this as a n x 3 x t array (a simple call to dim and aperem). Then rowMeans will find the group means, and you can just subtract those to get the deviations from the means, making use of recycling.

E.g.

D <- d[,-1]
dim(D) <- c(t,n,3)
D <- aperm(D, c(2,3,1))
gmeans <- rowMeans(D, dims=2)
d[,-1] - rep(gmeans, each=3)

That takes under 10ms for n=1000

On Mon, 25 Jul 2005, ronggui wrote:

>> n=10;t=3
>> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
>> head(d)
> id y x z
> [1,] 1 -2.1725379 0.07629954 -0.3985258
> [2,] 1 -1.2383038 -2.49667038 0.6966127
> [3,] 1 -1.2642401 -0.50613307 0.4895856
> [4,] 2 0.2171246 0.86711864 -0.6660036
> [5,] 2 2.2765760 -0.48547142 -1.4496664
> [6,] 2 0.5985345 -1.06427035 2.1761071
>
> first,i want to get the group mean of each variable,which i can use
>> d<-data.frame(d)
>> aggregate(d,list(d$id),mean)[,-1]
> id y x z
> 1 1 -1.55836060 -0.9755013 0.26255754
> 2 2 1.03074502 -0.2275410 0.02014565
> 3 3 0.20700121 -0.7159450 1.35890176
> 4 4 0.17839650 1.2575891 0.04135165
> 5 5 -0.20012508 0.4310221 0.55458899
> 6 6 -0.13084185 -0.2953392 0.28229068
> 7 7 0.20737288 -0.8863761 -0.50793880
> 8 8 0.07512612 -0.6591304 -0.21656533
> 9 9 0.94727796 -0.6108891 0.13529884
> 10 10 -0.04434875 0.1332086 -0.88229808
>
> then i want the group mean deviation data,like
>> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))
> y x z
> [1,] -0.6141773 1.0518008 -0.6610833
> [2,] 0.3200568 -1.5211691 0.4340552
> [3,] 0.2941205 0.4693682 0.2270281
> [4,] -0.8136205 1.0946597 -0.6861493
> [5,] 1.2458310 -0.2579304 -1.4698121
> [6,] -0.4322105 -0.8367293 2.1559614
>
> both above are what i want.though i can do it use the function to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it?
>
> myfun<-function(x,id)
> {
> x<-as.matrix(x)
> id<-as.factor(id)
> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
> xdm<- x[] <- x-xm[id,]
> re<-list(xm=xm, xdm=xdm)
> re
> }
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon Jul 25 17:01:32 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:34:00 EST