One way is to 'split' the indices of the rows to determine which ones to use. For example from the data give, I got the following:

> split(seq(nrow(obs)), list(obs\$a, obs\$b), drop=T)
\$`1.1`
[1] 1

\$`2.2`
[1] 2

\$`2.3`
[1] 3

\$`3.4`
[1] 4 5

\$`3.5`
[1] 6

You can then use this resulting list and find all entries with more than one value and use this to do your calculations.

On 10/2/06, Florian Jansen <jansen@uni-greifswald.de> wrote:
> I have a dataframe:
> (obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2))
> attach(obs)
> In reality its about 1 million rows.
> Some of the datasets have same contents in col a and! b like row 4 and 5.
> I want to do some calculations on col c within the duplicated rows and
> merge them afterwards:
>
> layer <- function(x) round((1-prod(1-x/100))*100,0)
> (covnew <- aggregate(c, list(a=a, b=b), layer))
>
> This works fine, but not with 1 mill. rows because of memory space
> limitations.
> So I thought to split the dataframe into the majority of unique rows on
> one hand and all duplicated rows on the other:
>
> With
> subset(obs, a %in% a[duplicated(a)])
> and !a respectively this works fine for single column comparison.
> This must be also possible for two column comparison, but I can`t get it.
>
> Thanks
> Florian
