Re: [R] filter data set unique, duplicate..

From: Sundar Dorai-Raj <sundar.dorai-raj_at_pdf.com>
Date: Thu 04 Aug 2005 - 04:58:24 EST

Hi, Anders/Dimitris,

Dimitris Rizopoulos wrote:
> maybe you could consider something like this:
>
> dat <- data.frame(x = c(1, 2, 2, 3, 3, 4),
> y1 = c(1, 1, 2, 1, 7, 8),
> y2 = c(NA, NA, NA, 5, 5, 4),
> y3 = c(3, 11, NA, 16, 2, 1))
> #############
> out <- as.data.frame(lapply(dat[-1], function(y, x) tapply(y, x, max,
> na.rm = TRUE), x = dat["x"]))
> out[out == -Inf] <- NA
> out$x <- unique(dat["x"])

Beware this line. If "x" is not sorted as it is in "dat" then your rows will be misaligned.

Here's another solution using "by" though it's no more efficient than what Dimitris has given.

out <- by(dat[-1], dat[1], function(y) {

   max.na <- function(x)
     if(all(is.na(x))) NA else max(x, na.rm = TRUE)    apply(y, 2, max.na)
})
out <- as.data.frame(do.call("rbind", out)) out <- cbind(x = as.numeric(row.names(out)), out) out

HTH, --sundar

> out
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/16/336899
> Fax: +32/16/337015
> Web: http://www.med.kuleuven.be/biostat/
> http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
> ----- Original Message -----
> From: "Anders Bjørgesæter" <anders.bjorgesater@bio.uio.no>
> To: <r-help@stat.math.ethz.ch>
> Sent: Wednesday, August 03, 2005 10:40 AM
> Subject: [R] filter data set unique, duplicate..
>
>
>

>>Hello
>>
>>First, thanks for the help for an earlier question about error 
>>handling!
>>
>>I have problem filtering a dataset.
>>I'm trying to filter the data in the y columns based on the values 
>>in the x
>>column, e.g.:
>>
>>x          y1        y2                    yn
>>1.0       1          NA                  3
>>2.0       1          NA                  11
>>2.0       2          NA                  NA
>>3.0       1          5                      16
>>3.0       7          5                      2
>>4.0       8          4                      1
>>
>>and want to keep the highest y if x is identical, like this:
>>
>>x          y1        y2                    yn
>>1.0       1          NA                  3
>>2.0       2          NA                  11
>>3.0       7          5                      16
>>4.0       8          4                      1
>>
>>or just as good:
>>
>>x          y1        y2                    yn
>>1.0    1          NA                  3
>>2.0       NA*    NA                  NA
>>2.0       2          NA                  11
>>3.0       NA*    5                      16
>>3.0       7          NA*                NA*
>>4.0       8          4                      1
>>
>>If any has any suggestions or pointers how to do this I would really
>>appreciate it.
>>
>>/Anders
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>

>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Aug 04 05:02:51 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:02:56 EST