From: Daniel Malter <daniel_at_umd.edu>

Date: Fri, 11 Jul 2008 19:53:04 -0400

cuncta stricte discussurus

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 12 Jul 2008 - 00:06:27 GMT

Date: Fri, 11 Jul 2008 19:53:04 -0400

The problem is that you do not really have categories. You draw 3 times
70000 random normal variables and then try to subset one by the other.
Since, no of the values will perfectly coincide with another, your code
would create something like 70000^3 categories. No wonder that you are
running out of memory. So what you are doing is nonsensical unless you
really have some groups/categories that cluster your data and which are
filled with a substantial number of observations (see example below).

x1=rnorm(30000,0,1)

x2=rnorm(30000,10,5) group1=rep(c(1:3),each=10000) group2=rep(c(1:3),10000)

aggregate(cbind(x1,x2),list(group1,group2),FUN=mean)

Best,

Daniel

cuncta stricte discussurus

-----Ursprüngliche Nachricht-----

Von: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] Im
Auftrag von sj

Gesendet: Friday, July 11, 2008 6:47 PM

An: r-help

Betreff: [R] data summarization etc...

Hello,

I am trying to do some fairly straightforward data summarization, i.e., the kind you would do with a pivot table in excel or by using SQL queires. I have a moderately sized data set of ~70,000 records and I am trying to compute some group averages and sum values within groups. the code example below shows how I am trying to go about doing this

pti <-rnorm(70000,10)

fid <- rnorm(70000,100)

finc <- rnorm(70000,1000)

### compute the sums of pti within fid groups sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum)

#### compute mean finc within fid groups tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean)

when I try to do it this way I get an error message telling me that enough memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of Memory). I figure that there must be a more efficent way to go about doing this. Please suggest.

I would typically do this kind of task in a database and use SQL to push the data around. I know RODBC allows you to write SQL to query external DBs. Is there any mechanisim that allows you to write SQL queies against datasets internal to R e.g. in the case above

I could do something like

set <- cbind(fid,pti,finc)

select fid, sum(pti)

from set

group by fid

that would be handy!

Thanks,

Spencer

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 12 Jul 2008 - 00:06:27 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 12 Jul 2008 - 01:31:45 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*