Re: [R] xyplot: subscripts, groups and subset

From: Deepayan Sarkar <deepayan.sarkar_at_gmail.com>
Date: Fri, 16 May 2008 17:03:26 -0700

On 5/16/08, Jim Price <price_ja_at_hotmail.com> wrote:
>
> I have stumbled across something in the Lattice package that is vexing me.
> Consider the code below:
> __________________________________________________________
>
> library(lattice)
>
>
> myData <- expand.grid(sub = factor(1:16), time = 1:10)
>
> myData$observed <- rnorm(nrow(myData))
> myData$fitted <- with(myData, ave(observed, sub, FUN = mean))
> myData$event.time <- with(myData, ave(observed, sub, FUN = function(.x) 10 *
> runif(1)))
>
> myData <- myData[order(myData$sub, myData$time),]
>
>
>
> # This version works...
> xyplot(
> fitted + observed ~ time | sub,
> data = myData,
> subscripts = TRUE,
> panel = function(..., groups = groups, subscripts = subscripts)
> {
> panel.xyplot(..., groups = groups, subscripts = subscripts)
>
> event.time <- unique(myData$event.time[subscripts])
> panel.abline(v = event.time, lty = 2, col = 'green')
> },
> type = c('l','p'),
> distribute.type = TRUE,
> as.table = TRUE
> )
>
>
>
> # ...but when you add the subset parameter it produces multiple index lines
> per subject
> xyplot(
> fitted + observed ~ time | sub,
> data = myData,
> subset = sub %in% sample(unique(sub), 9),
> subscripts = TRUE,
> panel = function(..., groups = groups, subscripts = subscripts)
> {
> panel.xyplot(..., groups = groups, subscripts = subscripts)
>
> event.time <- unique(myData$event.time[subscripts])
> # print(event.time)
> panel.abline(v = event.time, lty = 2, col = 'green')
> },
> type = c('l','p'),
> distribute.type = TRUE,
> as.table = TRUE
> )
>
> ___________________________________________________________________
>
>
> The (commented out) print statement I think indicates that there is a data
> reordering going on for the second example, that is causing the multiple
> index lines issue. Is there a neat solution to get the correct index lines
> per subject, or do I need a workaround? Or am I missing something
> fundamental in the code above that is causing issues?

The short answer is that the value of 'subscripts' in the panel function has an unusual interpretation when you use the extended formula notation (as in 'fitted + observed ~ time')

You would get better insight if you also printed out the 'subscripts' variable in each panel function. Basically, when you use the extended notation, lattice creates an artificial 'groups' variable that is longer than the original data frame, and consequently, the 'subscripts' variable must also have indices that exceed the number of rows of the original data frame. This is not a problem in your first call, because event.time[subscripts] returns NA for out-of-bound indices.

When using 'subset', things get complicated further because this
artificial 'groups' variable is computed after the subsetting, and
'subscripts' no longer even partially refer to the original data
frame. There is no good solution to this; it is the price you pay for the usefulness of the extended formula notation.

Your reasonable options are

(1) to subset your data beforehand; e.g.,

myDataSub <- subset(myData, sub %in% sample(unique(sub), 9))

and then use myDataSub in the call.

(2) not use the extended formula (this is conceptually cleaner, I think); e.g.,

xyplot(observed ~ time | sub, data = myData,

       subset = sub %in% sample(unique(sub), 9),
       fitted = myData$fitted,
       event.time = myData$event.time,
       panel = function(x, y, fitted, event.time, subscripts, ...)
       {
           panel.lines(x, fitted[subscripts], col = "black")
           panel.xyplot(x, y, ...)
           event.time <- unique(event.time[subscripts])
           panel.abline(v = event.time, lty = 2, col = 'green')
       },
       as.table = TRUE)

-Deepayan



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 17 May 2008 - 00:06:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 17 May 2008 - 01:30:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive