[R] separation depending on equal contents in more than one field

From: Florian Jansen <jansen_at_uni-greifswald.de>
Date: Mon 02 Oct 2006 - 15:30:32 GMT


Hi,

I have a dataframe:

(obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2))
attach(obs)

In reality its about 1 million rows.

Some of the datasets have same contents in col a and! b like row 4 and 5. I want to do some calculations on col c within the duplicated rows and merge them afterwards:

layer <- function(x) round((1-prod(1-x/100))*100,0)
(covnew <- aggregate(c, list(a=a, b=b), layer))

This works fine, but not with 1 mill. rows because of memory space limitations.
So I thought to split the dataframe into the majority of unique rows on one hand and all duplicated rows on the other:

With
subset(obs, a %in% a[duplicated(a)])
and !a respectively this works fine for single column comparison. This must be also possible for two column comparison, but I can`t get it.

Thanks
Florian

-- 
Dr. Florian Jansen
Geobotany & Nature Conservation
Institute for Botany and Landscape Ecology
Ernst-Moritz-Arndt-University
Grimmer Str. 88
17487 Greifswald - Germany
+49 (0)3834 86 4147

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Oct 03 01:39:54 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 02 Oct 2006 - 17:30:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.