Re: [R] Help

From: jim holtman <jholtman_at_gmail.com>
Date: Tue, 26 Apr 2011 21:40:48 -0400

Is this what you were looking for as output. You did not show what the output would look like:

> x

  var1 var2 X. varN

1  122 nnn1  …    1
2  213 nnn2  …    2
3  422 nnn4  …    2
4  432    …  …    3
5  441    …  …    4

6 500 … … 4
7 550 … … 4
> str(x)

'data.frame': 7 obs. of 4 variables:
 $ var1: int  122 213 422 432 441 500 550
 $ var2: Factor w/ 4 levels "…","nnn1","nnn2",..: 2 3 4 1 1 1 1
 $ X.  : Factor w/ 1 level "…": 1 1 1 1 1 1 1
 $ varN: int  1 2 2 3 4 4 4

> x$newCol <- ave(x$var1, x$varN, FUN=sum)
> x

  var1 var2 X. varN newCol
1  122 nnn1  …    1    122
2  213 nnn2  …    2    635
3  422 nnn4  …    2    635
4  432    …  …    3    432
5  441    …  …    4   1491

6 500 … … 4 1491
7 550 … … 4 1491
>

On Tue, Apr 26, 2011 at 6:31 PM, петрович <bistanz_at_gmail.com> wrote:
> Hey Everyone!
> I´m a quite  new R user .. I found a problem that I'd like to share with you
> and help me find a solution.
> I have a large txt. file which I opened with read.table command, and what I
> understood from many R manuals is that  I have a kind of matrix readed with
> read.table,
> I've used order() to sort my data and now my problem is: I have a variable
> that has many repeated values and  I would like to operate with the row
> indexes of "these repeated values": for example, suppose I have:
>
>  var1    var2     …    varN
>  122     nnn1    …     1
>  213     nnn2    …    2
>  422     nnn4    …    2
>  432     …        …    3
>  441     …        …    4
>  500     …        …    4
>  550     …        …    4
>
> So I want to obtain a new column where all elements of var1 are added at the
> places where varN are repetead ... so for varN=2  the new column correspond
> to this element will be 213+422, for varN=4 will be 441+500+550, where there
> is no such repeated values obviously there´s nothing to do and varN is the
> unique value.
> I made a function to do this but is not so good, (I hava a database with
> around 1 million rows and 5 columns) actually, this function works for not
> so large data:
>
> suma.rep=function(X,Y){
> resp=numeric(0)
> Z=unique(Y)
> for (i in (1:length(Z)))
> resp=c(resp,sum(X[which(Y==Z[i])]))
> return(resp)}
>
> When I  run this function with my large data, R appears calculating and I
> think it would take so long to make my new required column.(maybe 4 days)
> Question1: I "feel" that maybe there's a command that could help me to do
> this "simple" operation more elegant, I googled it but I couldnt find... Is
> there any such a command?
> Question2: Is a good idea to handle large data bases files with  R, as in my
> example?
>
> Thank you so much for your help.
> Christian Paúl
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 27 Apr 2011 - 01:44:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Apr 2011 - 03:00:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive