Re: [R] Help with R

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri 06 May 2005 - 00:36:27 EST

On 5/5/05, Ted Harding <Ted.Harding@nessie.mcc.ac.uk> wrote:
> On 05-May-05 Peter Dalgaard wrote:
> > [...]
> > Both systems are victims of the curse of the rectangular data set to
> > some extent. Prototypically, you record the sex of a rat along with
> > every single measurement on it, as if the rat could change sex at
> > millisecond resolution. This probably applies to all current
> > statistical systems, but there is some hope that R's more flexible
> > data structures can be leveraged to better handle multilevel data.
> > (Cue Probabilistic Relational Models a.m. Getoor et al., which Peter
> > Green brought up at the recent gR meeting.)
>
> I would agree with this hope. Indeed I was reminded of the issue
> by Alessandro Carletti's recent query about extracting features
> from the data at different marine sampling stations.
>
> My involvement goes back to the days (around 1980) when, with
> Jan BoŽtius, I was examining Johannes Schmidt's data on eel larvae
> obtained during his Atlantic cruises to investigate the "spawning
> question" of the European eel (funded by the Carlsberg Foundation,
> Peter!).
>
> Each Cruise consisted of a series of Stations by a given Ship
> at different Geographic positions, at each of which a number of Hauls
> would be made in different Years and different Months on different
> Days at different Times of day, using different Equipments and at
> different Depths or ranges of Depth, and of different Durations,
> and at different Speeds, resulting in capture of none or several
> specimens each of which would be examined for length, numbers of
> myomeres (muscle segments), and other features, along with hydrographic
> measurements.
>
> This could have been embodied in a huge "rectangular table" with of
> course much repetition of all the information that remains constant
> for each specimen in a haul. The specimen-specific data consisted of
> only 2-4 items, while the "constant" data consisted of 12-15
> items. There were nearly 20,000 larvae, so the "rectangular table"
> could have occupied well over a Megabyte.
>
> The alternative is a "list" representation, like:
>
> Investigation = list(Cruises)
> Cruise = list(Ship,list(Stations))
> Station = list((Position,list(Hauls))
> Haul = list((Year,Month,Day,Time,Duration,(Equipment data),(Depths),
> Speed,list(Specimens))
> Specimen=list(Length,Myomeres,...)
>
> In the end, the "list-like" view was the one adopted (I was limited
> to CP/M BASIC in some 48K of free RAM, with 256KB floppies, in those
> days), though not fully formally programmed (some of the "list
> parsing" was done by hand, i.e. replacing one floppy with another),
> though the BASIC program did retain the previously read data
> for a given Station when reading in new Haul data, and the Haul
> data when reading in Specimen data.
>
> Later, when I began to study C, I realised that the language
> was well adapted to implementing such structures in a program,
> though by then following this up would have been motivated by
> curiosity rather than needing to get the job done (it already
> was done).
>
> Now, in R, I see that in principle such data representations
> are well integrated into the language, and I've been yet again
> tempted to look at the question!
>
> However, while representing the raw data in such a form is
> well supported by R, it seems to me that extracting data
> in a way adapted to different analyses requires users to
> create their own methods, using the list-access primitives .
>
> For example, to study the changes in the distribution of
> lengths of specimens in relation to Position and Date
> (which was one of the important issues in that investigation),
> I don't think there are any "list processing" functions
> available in R which, given the list-based structure described
> above, would allow a simple query of the form
>
> means( Length , ~ Position:Date , data=Cruise )
>
> It's quite feasible to write one's own; but I think Peter's
> hope (expressed in excerpt above) looks like a first call
> for thinking about general methods for this sort of thing.
>

The Green Book defines a recursive apply function, rapply, that provides a general means of traversing that sort of structure.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri May 06 00:59:24 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:36 EST