# Re: [R] Advanced Filtering problem

Date: Thu, 19 Jun 2008 18:49:12 -0500

Hi Tyler,

> I've attached 100 rows of a data frame I am working with.
> I have one factor, id, with 27 levels. There are two columns of reference
> data, x and y (UTM coordinates), one column "date" in POSIXct format, and
> one column "diff" in times format (chron package).
>
> What I am trying to do is as follows:
> For each day of the year (date, irrespective of time), select that row for
> each id which contains the smallest "diff" value, resulting in an output
> containing in general one value per id per day.

There's a basic strategy that makes solving this type of problem much easier. I call it split-apply-combine. The basic idea is that if you had a single day, the problem would be pretty easy:

oneday <- subset(df, day == "01-01-05")
oneday[which.min(oneday\$diff), ]

# Let's make that into a function to make it easier to apply to all days

mindiff <- function(df) df[which.min(df\$diff), ]

# Now we split up the data frame so that we have a data frame for # each day

pieces <- split(df, df\$day)

# And use lapply to apply that function to each piece:

results <- lapply(pieces, mindiff)

# Then finally join all the pieces back together

df_done <- do.call("rbind", results)

So we split the data frame into individual days, picked the correct row for each day, and then joined all the pieces back together. This isn't the most efficient solution, but I think it's easy to see how each part works, and how you can apply it to new situations. If you aren't familiar with lapply or do.call, it's worth having a look at their examples to get a feel for how they work (although for this case you can of course just copy and paste them without caring how they work)

```--