From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Mon 15 Aug 2005 - 01:11:44 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Aug 15 01:19:05 2005

Date: Mon 15 Aug 2005 - 01:11:44 EST

On 8/14/05, Ajay Narottam Shah <ajayshah@mayin.org> wrote:

> I have written two functions which do useful things with panel data

*> a.k.a. longitudinal data, where one unit of observation (a firm or a
**> person or an animal) is observed on a uniform time grid:
**> - The first function makes lagged values of variables of your choice.
**> - The second function makes growth rates w.r.t. q observations ago,
**> for variables of your choice.
**>
**> These strike me as bread-and-butter tasks in dealing with panel
**> data. I couldn't find these functions in the standard R
**> libraries. They are presented in this email for two reasons. First,
**> it'll be great if R gurus can look at the code and propose
**> improvements. Second, it'll be great if some package-owner can adopt
**> these orphans :-) and make them available to the R community.
**>
**> The two functions follow:
**>
**> library(Hmisc) # Am using Lag() in this.
**>
**> # Task: For a supplied list of variables (the list `lagvars'),
**> # make new columns in a dataset denoting lagged values.
**> # You must supply `unitvar' which identifies the unit that's
**> # repeatedly observed.
**> # You must supply the name of the time variable `timevar'
**> # and you must tell a list of the lags that interest you (`lags')
**> # Example:
**> # paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:4)
**> paneldata.lags <- function(X, unitvar, timevar, lagvars, lags=1) {
**> stopifnot(length(lagvars)>=1)
**> X <- X[order(X[,timevar]),] # just in case it's not sorted.
**>
**> innertask <- function(Y, lagvars, lags) {
**> E <- labels <- NULL
**> for (v in lagvars) {
**> for (i in lags) {
**> E <- cbind(E, Lag(Y[,v], i))
**> }
**> labels <- c(labels, paste(v, ".l", lags, sep=""))
**> }
**> colnames(E) <- labels
**> cbind(Y, E)
**> }
**>
**> do.call("rbind", by(X, X[,unitvar], innertask, lagvars, lags))
**> }
**>
**> # Task: For a supplied list of variables (the list `gvars'),
**> # make new columns in a dataset denoting growth rates.
**> # You must supply `unitvar' which identifies the unit that's
**> # repeatedly observed.
**> # You must supply the name of the time variable `timevar'
**> # and you must tell the time periods Q (vector is ok) over which
**> # the growth rates are computed.
**> paneldata.growthrates <- function(X, unitvar, timevar, gvars, Q=1) {
**> stopifnot(length(gvars)>=1)
**> X <- X[order(X[,timevar]),]
**>
**> makegrowths <- function(x, q) {
**> new <- rep(NA, length(x))
**> for (t in (1+q):length(x)) {
**> new[t] <- 100*((x[t]/x[t-q])-1)
**> }
**> return(new)
**> }
**>
**> innertask <- function(Y, gvars, Q) {
**> E <- labels <- NULL
**> for (v in gvars) {
**> for (q in Q) {
**> E <- cbind(E, makegrowths(Y[,v], q))
**> }
**> labels <- c(labels, paste(v, ".g", Q, sep=""))
**> }
**> colnames(E) <- labels
**> cbind(Y, E)
**> }
**>
**> do.call("rbind", by(X, X[,unitvar], innertask, gvars, Q))
**> }
**>
**> Here's a demo of using them:
**>
**> # A simple panel dataset
**> A <- data.frame(year=rep(1980:1982,4),
**> person=factor(sort(rep(1:4,3))),
**> v1=round(rnorm(12),digits=2), v2=round(rnorm(12),digits=2))
**>
**> # Demo of creating lags for both variables v1 and v2 --
**> paneldata.lags(A, "person", "year", c("v1","v2"), lags=1:2)
**> # Demo of creating growth rates for v2 w.r.t. 1 & 2 years ago --
**> paneldata.growthrates(A, "person", "year", "v2", Q=1:2)
**>
**>
**>
**>
**> Finally, I have a question. In a previous posting on this subject,
**> Gabor showed me that my code:
**>
**> # Blast this function for all the values that A$person takes --
**> new <- NULL
**> for (f in levels(A$person)) {
**> new <- rbind(new,
**> make.additional.variables(subset(A, A$person==f),
**> nlags=2, Q=1))
**> }
**> A <- new; rm(new)
**>
**> can be replaced by one do.call() (as used above). It's awesome, but I
**> don't understand it! :-) Could someone please explain how and why this
**> works? I know by() and I see that when I do by(A,A$x), it gives me a
**> list containing as many entries as are levels of A$x. I couldn't think
**> of a way to force all this into one data frame; the best I could do
**> was to do for (f in levels (A$person)) {} as shown here. The two
**> functions above are using do.call() as Gabor used them, and they're
**> awesome, but I don't understand why they work! The man page ?do.call
**> was a bit too cryptic and I couldn't comprehend it. Where can I learn
**> this stuff?
**>
*

Don't know of a source, I just study code, but conceptually by just splits up the rows by the grouping argument giving a list of data frames and applies the function to each element of the list giving the result.

For example, if we write:

f <- function(x) colSums(x[,-5])

iris.by <- by(iris, iris$Species, f)

is the same as:

f <- function(x) colSums(x[,-5])

iris.split <- split(iris, iris$Species)

iris.lapply <- lapply(iris.split, f)

except that in the by case the result gets a class of "by".

In either of the above cases the result is a list of these three elements, i.e. these three data frames:

el1 <- iris.by[[1]]; el2 <- iris.by[[2]]; el3 <- iris.by[[3]]

Now, if g <- function(x,y)x+y then

g(1,2)

is the same as

do.call("g", list(1,2))

so going back the iris example, to rbind el1, el2 and el3 together we do this:

rbind(el1, el2, el3)

which is the same as

do.call("rbind", list(e1, e2, e3))

which is the same as

do.call("rbind", iris.by)

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Aug 15 01:19:05 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:19:26 EST
*