Re: [Rd] speeding up perception

From: ivo welch <ivo.welch_at_gmail.com>
Date: Sun, 03 Jul 2011 22:19:53 -0700

thank you, simon.  this was very interesting indeed. I also now understand how far out of my depth I am here.

fortunately, as an end user, obviously, *I* now know how to avoid the problem. I particularly like the as.list() transformation and back to as.data.frame() to speed things up without loss of (much) functionality.

more broadly, I view the avoidance of individual access through the use of apply and vector operations as a mixed "IQ test" and "knowledge test" (which I often fail). However, even for the most clever, there are also situations where the KISS programming principle makes explicit loops still preferable. Personally, I would have preferred it if R had, in its standard "statistical data set" data structure, foregone the row names feature in exchange for retaining fast direct access. R could have reserved its current implementation "with row names but slow access" for a less common (possibly pseudo-inheriting) data structure.

If end users commonly do iterations over a data frame, which I would guess to be the case, then the impression of R by (novice) end users could be greatly enhanced if the extreme penalties could be eliminated or at least flagged. For example, I wonder if modest special internal code could store data frames internally and transparently as lists of vectors UNTIL a row name is assigned to. Easier and uglier, a simple but specific warning message could be issued with a suggestion if there is an individual read/write into a data frame ("Warning: data frames are much slower than lists of vectors for individual element access").

I would also suggest changing the "Introduction to R" 6.3 from "A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions." to "A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions. However, data frames can be much slower than matrices or even lists of vectors (which, like data frames, can contain different types of columns) when individual elements need to be accessed." Reading about it immediately upon introduction could flag the problem in a more visible manner.

regards,

/iaw



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 04 Jul 2011 - 05:22:54 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 05 Jul 2011 - 00:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive