From: Deepayan Sarkar <deepayan.sarkar_at_gmail.com>

Date: Sat, 14 Jul 2007 18:32:25 -0700

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 15 Jul 2007 - 01:42:02 GMT

Date: Sat, 14 Jul 2007 18:32:25 -0700

On 7/14/07, Stephen Tucker <brown_emu_at_yahoo.com> wrote:

*>
*

> I wonder what kind of objects? Are there large advantages for allowing

*> lattice functions to operate on objects other than data frames - I
**> couldn't find any screenshots of flowViz but I imagine those objects
**> would probably be list of arrays and such? I tend to think of mapply()
**> [and more recently melt()], etc. could always be applied beforehand,
**> but I suppose that would undermine the case for having generic
**> functions to support the rich collection of object classes in R...
*

There's a copy of a presentation at

http://www.ficcs.org/meetings/ficcs3/presentations/DeepayanSarkar-flowviz.pdf

and a (largish - 37M) vignette linked from

http://bioconductor.org/packages/2.1/bioc/html/flowViz.html

Neither of these really talk about the challenge posed by the size of the data. The data structure, as with most microarray-type experiments, is like a data frame, except that the response for every experimental unit is itself a large matrix. If we represented the GvHD data set (the one used in the examples) as a "long format" data frame that lattice would understand, it would have 585644 rows and 12 columns (8 measurements that are different for each row, and 4 phenotypic variables that are the same for all rows coming from a single sample). And this is for a smallish subset of the actual experiment.

In practice, the data are stored in an environment to prevent unnecessary copying, and panel functions only access one data matrix at a time.

-Deepayan

> --- Deepayan Sarkar <deepayan.sarkar@gmail.com> wrote:

*>
**> > On 7/11/07, hadley wickham <h.wickham_at_gmail.com> wrote:
**> > > > A question/comment: I have usually found that the subscripts argument
**> > is
**> > > > what I need when passing *external* information into the panel
**> > function, for
**> > > > example, when I wish to add results from a fit done external to the
**> > trellis
**> > > > call. Fits[subscripts] gives me the fits (or whatever) I want to plot
**> > for
**> > > > each panel. It is not clear to me how the panel layout information from
**> > > > panel.number(), etc. would be helpful here instead. Am I correct? -- or
**> > is
**> > > > there a smarter way to do this that I've missed?
**> > >
**> > > This is one of things that I think ggplot does better - it's much
**> > > easier to plot multiple data sources. I don't have many examples of
**> > > this yet, but the final example on
**> > > http://had.co.nz/ggplot2/geom_abline.html illustrates the basic idea.
**> >
**> > That's probably true. The Trellis approach is to define a plot by
**> > "data source" + "type of plot", whereas the ggplot approach (if I
**> > understand correctly) is to create a specification for the display
**> > (incrementally?) and then render it. Since the specification can be
**> > very general, the approach is very flexible. The downside is that you
**> > need to learn the language.
**> >
**> > On a philosophical note, I think the apparent limitations of Trellis
**> > in some (not all) cases is just due to the artificial importance given
**> > to data frames as the one true container for data. Now that we have
**> > proper multiple dispatch in S4, we can write methods that behave like
**> > traditional Trellis calls but work with more complex data structures.
**> > We have tried this in one bioconductor package (flowViz) with
**> > encouraging results.
**> >
**> > -Deepayan
*

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 15 Jul 2007 - 01:42:02 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 17 Jul 2007 - 09:33:12 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*