[R] Help

From: <bistanz_at_gmail.com>
Date: Tue, 26 Apr 2011 17:31:47 -0500


Hey Everyone!
Im a quite new R user .. I found a problem that I'd like to share with you and help me find a solution.
I have a large txt. file which I opened with read.table command, and what I understood from many R manuals is that I have a kind of matrix readed with read.table,
I've used order() to sort my data and now my problem is: I have a variable that has many repeated values and I would like to operate with the row indexes of "these repeated values": for example, suppose I have:

  var1    var2         varN
 122     nnn1         1
 213     nnn2        2
 422     nnn4        2
 432                 3
 441                 4
 500                 4
 550                 4

So I want to obtain a new column where all elements of var1 are added at the places where varN are repetead ... so for varN=2 the new column correspond to this element will be 213+422, for varN=4 will be 441+500+550, where there is no such repeated values obviously theres nothing to do and varN is the unique value.
I made a function to do this but is not so good, (I hava a database with around 1 million rows and 5 columns) actually, this function works for not so large data:

suma.rep=function(X,Y){
resp=numeric(0)
Z=unique(Y)
for (i in (1:length(Z)))
resp=c(resp,sum(X[which(Y==Z[i])]))
return(resp)}

When I run this function with my large data, R appears calculating and I think it would take so long to make my new required column.(maybe 4 days) Question1: I "feel" that maybe there's a command that could help me to do this "simple" operation more elegant, I googled it but I couldnt find... Is there any such a command?
Question2: Is a good idea to handle large data bases files with R, as in my example?

Thank you so much for your help.
Christian Pal

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 27 Apr 2011 - 01:18:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Apr 2011 - 02:40:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive