Re: [R] How to delete the replicate rows by summing up the numeric columns

From: Yi <liuyi.feier_at_gmail.com>
Date: Tue, 29 Jun 2010 18:33:02 -0700

Thank you very much for response. Finally I took David's way. Others' work well for this specific case. But I find problem is there are more than one column of character variables.

z=c('ab','ah','bc','ah','dv')
x=substr(z,start=1,stop=1)
y=substr(z,start=2,stop=2)

v1=5:9
v2=7:11
data=data.frame(x,y,z,v1,v2)

### I want to sum up v1 and v2 wrt z only, and delete duplicat rows. I do not care x and y here, just keep them only.

data$summed <- ave(data$v1, data$z, FUN=sum) data[!duplicated(data$z),c('z','x','y','summed')] ### I tried to use 'melt' or 'cast' to solve the problem (as Nikhil and Dennis sugguested). But it seems since x and y are also charactor variables, it just does not work here.

Basically, my problem is perfectly answered. But if you want to comment on this example, please do it. I really appreciate it.

Let's say, if you think we can use cast or aggregate functions to delete duplicate rows by summing up the numerica columns where there are several columns are charactor variables. I feel no way to deal with two types at the same time.
Thank you.

On Tue, Jun 29, 2010 at 6:04 PM, Dennis Murphy <djmuser_at_gmail.com> wrote:

> Hi:
>
> If you can deal with alphabetic order, the following seems to work:
>
> v <- aggregate(third ~ first, data = data, FUN = sum)
> v$second <- levels(data$second)
> v[, c(1, 3, 2)]
> first second third
> 1 b Brazil 2
> 2 c China 15
> 3 e England 13
> 4 f France 8
> 5 j Japan 5
> 6 k Korea 4
> 7 u usa 8
>
> v$second works in this case because the levels are ordered and all are used
> when inserted in v. That's not a guarantee in more complicated problems and
> frankly, this one is a kludge.
>
> A plyr version would be
>
> v <- ddply(data, .(first), summarise, third = sum(third), second = second)
> v[!duplicated(v$first), c(1, 3, 2)]
> first second third
> 1 b Brazil 2
> 2 c China 15
> 4 e England 13
> 6 f France 8
> 7 j Japan 5
> 8 k Korea 4
> 9 u usa 8
>
> The advantage of ddply over aggregate in this case is that ddply allows one
> to insert second as an 'identity' of sorts; however, the result contains
> duplicate rows, so we need to remove them in the second statement.
>
> Using melt and cast from the reshape package,
> mm <- melt(data, id = c('first', 'second'))
> (ms <- cast(mm, first + second ~ . , sum))
> first second (all)
> 1 b Brazil 2
> 2 c China 15
> 3 e England 13
> 4 f France 8
> 5 j Japan 5
> 6 k Korea 4
> 7 u usa 8
>
> names(ms)[3] <- 'third'
>
> This seems to be the cleanest version of the three in terms of getting both
> ID variables into the final result.
>
> HTH,
> Dennis
>
> On Tue, Jun 29, 2010 at 12:05 PM, Yi <liuyi.feier_at_gmail.com> wrote:
>
>> Hi, folks,
>>
>> I am sorry that I did not state the problem correctly yesterday.
>>
>> Please let me address the problem by the following codes:
>>
>> first=c('u','b','e','k','j','c','u','f','c','e')
>>
>> second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
>> third=1:10
>> data=data.frame(first,second,third)
>>
>> ## You may understand values in the first column are the unique codes for
>> those in the second column.
>> ####So 'u' is only for usa. Replicate values appear the same rows for the
>> first and second columns.
>> ### Now I want to delete replicate rows with the same values in first
>> (sceond) rows
>> ####and sum up values in the third column for the same values.
>>
>> mm=melt(data,id='first')
>> sum=cast(mm,first~variable,sum) ### This does not work.
>>
>> ###I tried another way to do this
>> mm= melt(data, id='first',measure='third')
>> sum=cast(mm,first~variable,sum)
>>
>> ## But then the problem is how to 'merge' the result with the second
>> column
>> in the dataset.
>>
>>
>> The expected dataframe is like this:
>>
>> (I showed a wrong expected dataframe yesterday.)
>>
>> first second third
>> 1 u usa 8
>> 2 b Brazil 2
>> 3 e England 13
>> 4 k Korea 4
>> 5 j Japan 5
>> 6 c China 15
>> 8 f France 8
>>
>> Thanks in advance.
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 30 Jun 2010 - 02:28:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jun 2010 - 02:40:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive