Re: [R] selecting maximum values

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Thu 05 May 2005 - 23:17:00 EST


On 04-May-05 Roger Bivand wrote:
> On Wed, 4 May 2005, Sean Davis wrote:
>

>> see ?aggregate.

>
> Or maybe tapply, or its close relative, by:
>
>> by(df, list(df$station, df$date), function(x) 

> + x$row[which.max(x$chlorophyll)])
>: Ancona
>: 21/06/01

> [1] NA
> ------------------------------------------------------------
>: Castagneto
>: 21/06/01

> [1] 3
> ------------------------------------------------------------
>: Ancona
>: 23/06/01

> [1] 6
> ------------------------------------------------------------
>: Castagneto
>: 23/06/01

> [1] NA
>
> since happily a row ID column was included in the data frame. Note that
> which.max only reports the row of the first maximum if there are ties.

I've tried to work out a method which gives a cleaner result (for instance, the NAs are ugly and unnecessary).

I've called Alessandro's data (below) "chl" (for chlorophyll), and using Roger's command above assign the result to "tmp":

tmp<-by(chl, list(chl$station, chl$date),

        function(x) x$row[which.max(x$chlorophyll)] )

Then, using either tmp[1:2,] or tmp[,1:2] we get

  tmp[,1:2]

  ##            21/06/01 23/06/01
  ## Ancona           NA        6
  ## Castagneto        3       NA

which is a better layout but still has the NAs.

It would be better to be able to get something like

  ## Ancona     23/06/01        6
  ## Castagneto 21/06/01        3

but I don't see how to do it even for just these 2 stations.

Now, however, suppose we want not just the rows but the values as well. Try a modified function

  tmp<-by(chl, list(chl$station, chl$date),

          function(x) list(Row=x$row[which.max(x$chlorophyll)],
                           Val=max(x$chlorophyll))
         )

Now

  str(tmp)

  ## List of 4
  ##  $ : NULL
  ##  $ :List of 2
  ##   ..$ Row: int 3
  ##   ..$ Val: num 2.4
  ##  $ :List of 2
  ##   ..$ Row: int 6
  ##   ..$ Val: num 2.5
  ##  $ : NULL
  ##  - attr(*, "dim")= int [1:2] 2 2
  ##  - attr(*, "dimnames")=List of 2
  ##   ..$ : chr [1:2] "Ancona" "Castagneto"
  ##   ..$ : chr [1:2] "21/06/01" "23/06/01"
  ##  - attr(*, "call")= language by.data.frame(data = chl, INDICES =
  ##  list(chl$station, chl$date),      FUN = function(x) list(Row =
  ## x$row[which.max(x$chlorophyll)],  ...
  ##  - attr(*, "class")= chr "by"

I've not succeeded (though experience tells me that others could) in extracting from this something like the following:

  ##        Ancona Castagneto 
  ##Row          6          3 
  ##Val        2.5        2.4 
  ##Date  23/06/01   21/06/01

Questions: (a) What's the trick? (b) How to generalise it?

Ted.

>

>> 
>> Sean
>> 
>> On May 4, 2005, at 11:43 AM, alessandro carletti wrote:
>> 
>> > Sorry for disturbing you with another newbie question!
>> > I have a data frame about coastal waters quality
>> > parameters: for some parameters (e.g. NH3) I have only
>> > 1 observation for each sampling station and each
>> > sampling date, while in other cases (chlorophyll) I
>> > have 1 obs for each meter-depth for each station and
>> > date. How can I select only the max chlorophyll value
>> > for each station/date?
>> >
>> > example
>> >
>> > row  station         date        depth     chlorophyll
>> > 1     Castagneto      21/06/01     -0.5         2.0
>> > 2     Castagneto      21/06/01     -1.5         2.2
>> > 3     Castagneto      21/06/01     -2.5         2.4
>> > 4     Castagneto      21/06/01     -3.5         2.1
>> > 5     Ancona          23/06/01     -0.5         2.4
>> > 6     Ancona          23/06/01     -1.5         2.5
>> > 7     Ancona          23/06/01     -2.5         2.2
>> > 8     Ancona          23/06/01     -3.5         2.1
>> > 9     Ancona          23/06/01     -4.5         1.9
>> > ...
>> >
>> > I'd like to select only row 3 and 6, the ones with max
>> > chlorophyll values, or have the mean for the rows 1:4
>> > and 5:9
>> >
>> > Thanks


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 05-May-05                                       Time: 14:13:13
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu May 05 23:38:02 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:35 EST