# [R] zoo:rollapply by multiple grouping factors

From: Mark Novak <mnovak1_at_ucsc.edu>
Date: Sun, 03 Apr 2011 08:58:19 -0700

# The data I have are for the abundance dynamics of multiple species
observed in multiple fixed plots at multiple sites. (I total I have 7 sites, ~3-5 plots/site, ~150 species/plot, for 60 time-steps each.) So my data look something like this:

dat<-data.frame(Site=rep(1), Plot=rep(c(rep(1,8),rep(2,8),rep(3,8)),1), Time=rep(c(1,1,2,2,3,3,4,4)), Sp=rep(1:2), Count=sample(24)) dat

# Let the function I want to apply over a right-aligned window of w=2
time steps be:
cv<-function(x){sd(x)/mean(x)}
w<-2

# The final output I want would look something like this:
Out<-data.frame(dat,CV=round(c(NA,NA,runif(6,0,1),c(NA,NA,runif(6,0,1))),2))

# I could reshape and apply zoo:rollapply() to a given plot at a given
site, and reshape again as follows:
library(zoo)

```a<-subset(dat,Site==1&Plot==1)
b<-reshape(a[-c(1,2)],v.names='Count',idvar='Time',timevar='Sp',direction='wide')
d<-zoo(b[,-1],b[,1])
```

d
out<-rollapply(d, w, cv, na.pad=T, align='right') out

# I would thereby have to loop through all my sites and plots which,
although it deals with all species at once, still seems exceedingly inefficient.

# So the question is, how do I use something like aggregate.zoo or
tapply or even lapply to apply rollapply on each species' time series.

# The closest I've come is the following two approaches:

# First let:

datx<-list(Site=dat\$Site,Plot=dat\$Plot,Sp=dat\$Sp) daty<-dat\$Count

# Method 1.

out1<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') }) out1
out1[,,1]

# Which "works" in that it gives me the right answers, but in a format
from which I can't figure out how to get back into the format I want.

# Method 2.

out2<-aggregate(daty,by=datx,fun)
out2

# Which superficially "works" better, but again only in a format I can't
figure out how to use because the output seems to be a mix of data.frame and lists.
out2[1,4]
out2[1,5]
is.data.frame(out2)
is.list(out2)

# The situation is made more problematic by the fact that the time point
of first survey can differ between plots (e.g., site1-plot3 may only start at time-point 3). As in...
dat2<-dat
dat2<-dat2[-which(dat2\$Plot==3 & dat2\$Time<3),] dat2

# I must therefore ensure that I'm keeping track of the true time
associated with each value, not just the order of their occurences. This information is (seemingly) lost by both methods. datx<-list(Site=dat2\$Site,Plot=dat2\$Plot,Sp=dat2\$Sp) daty<-dat2\$Count

# Method 1.

out3<-tapply(seq(along=daty),datx,function(i,x=daty){ rollapply(zoo(x[i]), w, cv, na.pad=T, align='right') }) out3
out3[1,3,1]
time(out3[1,3,1])

# Method 2

out4<-aggregate(daty,by=datx,fun)
out4
time(out4[3,4])

try? Any thoughts and suggestions are much appreciated!

# Thanks!
# -mark

```--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--
Ecology & Evolutionary Biology
University of California, Santa Cruz
Long Marine Laboratory
Santa Cruz, CA 95060-5730
Ph: 773-256-8645
Fax: 831-459-3383
http://people.ucsc.edu/~mnovak1/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help