[R] Combining a list of similar dataframes into a single dataframe

From: Mike Nielsen <mr.blacksheep_at_gmail.com>
Date: Sun 09 Jul 2006 - 08:40:00 EST


I would be very grateful to anyone who could point to the error of my ways in the following.

I have a dataframe called net1, as such:

> str(net1)
`data.frame': 114192 obs. of 9 variables:
$ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1
1 1 1 1 1 1 1 ...
$ ts :'POSIXct', format: chr "2006-06-30 12:31:44"
"2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30 12:31:44" ...
$ instance : Factor w/ 22 levels "1","2","Compaq Ethernet_Fast
Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ...
$ instanceno : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ perftime : num 3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ...
$ perffreq : num 6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ...
$ perftime100nsec: num 1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ...
$ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 3 2
4 1 3 2 4 1 3 ...
$ countervalue : num 6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ...
>

What I am trying to do is subset this thing down by server, instance, instanceno, countername and then apply a function to each subsetted dataframe. The function performs a calculation on countervalue, essentially "collapsing" instanceno and instance down to a single value.

Here is a snippet of my code:
t1 <- by(net1,

         list(
              net1$server,
              factor(as.character(net1$countername))),# get rid of
unused levels of countername for this server
         function(x){
           g <- by(x,
                   list(factor(as.character(x$instance)), # get rid of
unused levels of instance for this server
                   factor(as.character(x$instanceno))),   # same with instanceno

function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})
           data.frame(server=x$server,
                      ts=x$ts,
                      countername = x$countername,
                      countervalue =
apply(sapply(g[!sapply(g,is.null)],I),1,sum))
         })

So t1 then is a list of dataframes, each with an identical set of columns)

> str(t1[[1]])

`data.frame':	149 obs. of  4 variables:

$ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1
1 1 1 1 1 1 ...
$ ts :'POSIXct', format: chr "2006-06-30 12:31:44"
"2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ...
$ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1
1 1 1 1 1 ...
$ countervalue: num NA 938 816 4213 906 ...

What I'd dearly love to do, without looping or lapply-ing through t1 and rbinding (too much data for this to finish quickly enough -- this is about 10% of what I'm eventually going to have to manage), is convert t1 to one big dataframe.

On the other hand, I admit that I may be going about this wrongly from the start; perhaps there's a better approach?

Any pointers would be most gratefully received.

Many thanks!

-- 
Regards,

Mike Nielsen

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sun Jul 09 08:43:40 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 09 Jul 2006 - 10:16:49 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.