Re: [R] Subset by Factor by date

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Fri, 13 Jun 2008 23:24:22 -0500

on 06/13/2008 11:10 PM T.D.Rudolph wrote:
> I have a dataframe, x, with over 60,000 rows that contains one Factor, "id",
> with 27 levels.
> The dataframe contains numerous continuous values (along column "diff") per
> day (column "date") for every level of id. I would like to select only one
> row per animal per day, i.e. that containing the minimum value of "diff",
> along the full length of 1:nrow(x). I am not yet able to conduct anything
> beyond the simplest of functions and I was hoping someone could suggest an
> effective way of producing this output.
>
> e.g. given this input:
>
> id day diff
> 1 01-01-09 0.5
> 1 01-01-09 0.7
> 2 01-01-09 0.2
> 2 01-01-09 0.4
> 1 01-02-09 0.1
> 1 01-02-09 0.3
> 2 01-02-09 0.3
> 2 01-02-09 0.4
>
> I would like to produce this output:
> id day diff
> 1 01-01-09 0.5
> 2 01-01-09 0.2
> 1 01-02-09 0.1
> 2 01-02-09 0.3
>
> It doesn't seem extremely difficult but I'm sure there are easier ways than
> how I am currently approaching it!

See ?aggregate

 > DF
   id day diff

1  1 01-01-09  0.5
2  1 01-01-09  0.7
3  2 01-01-09  0.2
4  2 01-01-09  0.4
5  1 01-02-09  0.1
6  1 01-02-09  0.3
7  2 01-02-09  0.3
8  2 01-02-09  0.4


 > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
   id      day   x
1  1 01-01-09 0.5
2  2 01-01-09 0.2

3 1 01-02-09 0.1
4 2 01-02-09 0.3

Note that I have not converted the 'day' column to a 'date' class. You would need to do that to perform any other date related operations (including chronological sorting) on that column. See ?as.Date for more information. For example:

   DF$day <- as.Date(DF$day, format = "%m-%d-%y")

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 04:28:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 06:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive