[R] data summerization etc...

From: sj <ssj1364_at_gmail.com>
Date: Fri, 11 Jul 2008 15:44:08 -0700


Hello,

I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this

pti <-rnorm(70000,10)
fid <- rnorm(70000,100)
finc <- rnorm(70000,1000)

### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum)

#### compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean)

when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest.

I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above

I could do something like

set <- cbind(fid,pti,finc)

select fid, sum(pti)
from set
group by fid

that would be handy!

Thanks,

Spencer

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 11 Jul 2008 - 22:48:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 12 Jul 2008 - 12:32:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive