From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>

Date: Thu 05 May 2005 - 23:17:00 EST

> Or maybe tapply, or its close relative, by:

> + x$row[which.max(x$chlorophyll)])

> [1] NA

> [1] 3

> [1] 6

*> [1] NA
> since happily a row ID column was included in the data frame. Note that

*> which.max only reports the row of the first maximum if there are ties.
*

On 04-May-05 Roger Bivand wrote:

> On Wed, 4 May 2005, Sean Davis wrote:

>> see ?aggregate.

> Or maybe tapply, or its close relative, by:

>> by(df, list(df$station, df$date), function(x)

> + x$row[which.max(x$chlorophyll)])

>: Ancona >: 21/06/01

> [1] NA

> ------------------------------------------------------------

>: Castagneto >: 21/06/01

> [1] 3

> ------------------------------------------------------------

>: Ancona >: 23/06/01

> [1] 6

> ------------------------------------------------------------

>: Castagneto >: 23/06/01

> since happily a row ID column was included in the data frame. Note that

I've tried to work out a method which gives a cleaner result (for instance, the NAs are ugly and unnecessary).

I've called Alessandro's data (below) "chl" (for chlorophyll), and using Roger's command above assign the result to "tmp":

tmp<-by(chl, list(chl$station, chl$date),

function(x) x$row[which.max(x$chlorophyll)] )

Then, using either tmp[1:2,] or tmp[,1:2] we get

tmp[,1:2]

## 21/06/01 23/06/01 ## Ancona NA 6 ## Castagneto 3 NA

which is a better layout but still has the NAs.

It would be better to be able to get something like

## Ancona 23/06/01 6 ## Castagneto 21/06/01 3

but I don't see how to do it even for just these 2 stations.

Now, however, suppose we want not just the rows but the values as well. Try a modified function

tmp<-by(chl, list(chl$station, chl$date),

function(x) list(Row=x$row[which.max(x$chlorophyll)], Val=max(x$chlorophyll)) )

Now

str(tmp)

## List of 4 ## $ : NULL ## $ :List of 2 ## ..$ Row: int 3 ## ..$ Val: num 2.4 ## $ :List of 2 ## ..$ Row: int 6 ## ..$ Val: num 2.5 ## $ : NULL ## - attr(*, "dim")= int [1:2] 2 2 ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:2] "Ancona" "Castagneto" ## ..$ : chr [1:2] "21/06/01" "23/06/01" ## - attr(*, "call")= language by.data.frame(data = chl, INDICES = ## list(chl$station, chl$date), FUN = function(x) list(Row = ## x$row[which.max(x$chlorophyll)], ... ## - attr(*, "class")= chr "by"

I've not succeeded (though experience tells me that others could) in extracting from this something like the following:

## Ancona Castagneto ##Row 6 3 ##Val 2.5 2.4 ##Date 23/06/01 21/06/01

Questions: (a) What's the trick? (b) How to generalise it?

Ted.

>> >> Sean >> >> On May 4, 2005, at 11:43 AM, alessandro carletti wrote: >> >> > Sorry for disturbing you with another newbie question! >> > I have a data frame about coastal waters quality >> > parameters: for some parameters (e.g. NH3) I have only >> > 1 observation for each sampling station and each >> > sampling date, while in other cases (chlorophyll) I >> > have 1 obs for each meter-depth for each station and >> > date. How can I select only the max chlorophyll value >> > for each station/date? >> > >> > example >> > >> > row station date depth chlorophyll >> > 1 Castagneto 21/06/01 -0.5 2.0 >> > 2 Castagneto 21/06/01 -1.5 2.2 >> > 3 Castagneto 21/06/01 -2.5 2.4 >> > 4 Castagneto 21/06/01 -3.5 2.1 >> > 5 Ancona 23/06/01 -0.5 2.4 >> > 6 Ancona 23/06/01 -1.5 2.5 >> > 7 Ancona 23/06/01 -2.5 2.2 >> > 8 Ancona 23/06/01 -3.5 2.1 >> > 9 Ancona 23/06/01 -4.5 1.9 >> > ... >> > >> > I'd like to select only row 3 and 6, the ones with max >> > chlorophyll values, or have the mean for the rows 1:4 >> > and 5:9 >> > >> > Thanks

