Re: [R] separation depending on equal contents in more than one field

From: jim holtman <jholtman_at_gmail.com>
Date: Mon 02 Oct 2006 - 17:13:42 GMT

One way is to 'split' the indices of the rows to determine which ones to use. For example from the data give, I got the following:

> split(seq(nrow(obs)), list(obs$a, obs$b), drop=T)
$`1.1`
[1] 1

$`2.2`
[1] 2

$`2.3`
[1] 3

$`3.4`
[1] 4 5

$`3.5`
[1] 6

You can then use this resulting list and find all entries with more than one value and use this to do your calculations.

On 10/2/06, Florian Jansen <jansen@uni-greifswald.de> wrote:
> Hi,
>
> I have a dataframe:
>
> (obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2))
> attach(obs)
>
> In reality its about 1 million rows.
>
> Some of the datasets have same contents in col a and! b like row 4 and 5.
> I want to do some calculations on col c within the duplicated rows and
> merge them afterwards:
>
> layer <- function(x) round((1-prod(1-x/100))*100,0)
> (covnew <- aggregate(c, list(a=a, b=b), layer))
>
> This works fine, but not with 1 mill. rows because of memory space
> limitations.
> So I thought to split the dataframe into the majority of unique rows on
> one hand and all duplicated rows on the other:
>
> With
> subset(obs, a %in% a[duplicated(a)])
> and !a respectively this works fine for single column comparison.
> This must be also possible for two column comparison, but I can`t get it.
>
> Thanks
> Florian
>
> --
> Dr. Florian Jansen
> Geobotany & Nature Conservation
> Institute for Botany and Landscape Ecology
> Ernst-Moritz-Arndt-University
> Grimmer Str. 88
> 17487 Greifswald - Germany
> +49 (0)3834 86 4147
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Oct 03 03:17:53 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 02 Oct 2006 - 17:30:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.