Re: [R] Advanced Filtering problem

From: hadley wickham <h.wickham_at_gmail.com>
Date: Thu, 19 Jun 2008 18:49:12 -0500

Hi Tyler,

> I've attached 100 rows of a data frame I am working with.
> I have one factor, id, with 27 levels. There are two columns of reference
> data, x and y (UTM coordinates), one column "date" in POSIXct format, and
> one column "diff" in times format (chron package).
>
> What I am trying to do is as follows:
> For each day of the year (date, irrespective of time), select that row for
> each id which contains the smallest "diff" value, resulting in an output
> containing in general one value per id per day.

There's a basic strategy that makes solving this type of problem much easier. I call it split-apply-combine. The basic idea is that if you had a single day, the problem would be pretty easy:

df <- read.csv("http://www.nabble.com/file/p18018170/subdata.csv")

oneday <- subset(df, day == "01-01-05")
oneday[which.min(oneday$diff), ]

# Let's make that into a function to make it easier to apply to all days

mindiff <- function(df) df[which.min(df$diff), ]

# Now we split up the data frame so that we have a data frame for # each day

pieces <- split(df, df$day)

# And use lapply to apply that function to each piece:

results <- lapply(pieces, mindiff)

# Then finally join all the pieces back together

df_done <- do.call("rbind", results)

So we split the data frame into individual days, picked the correct row for each day, and then joined all the pieces back together. This isn't the most efficient solution, but I think it's easy to see how each part works, and how you can apply it to new situations. If you aren't familiar with lapply or do.call, it's worth having a look at their examples to get a feel for how they work (although for this case you can of course just copy and paste them without caring how they work)

Hadley

-- 
http://had.co.nz/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 20 Jun 2008 - 01:26:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 20 Jun 2008 - 01:30:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive