Re: [Rd] speeding up perception

From: Timothée Carayol <timothee.carayol_at_gmail.com>
Date: Mon, 04 Jul 2011 07:47:59 +0100

Hi --

It's my first post on this list; as a relatively new user with little knowledge of R internals, I am a bit intimidated by the depth of some of the discussions here, so please spare me if I say something incredibly silly.

I feel that someone at this point should mention Matthew Dowle's excellent data.table package
(http://cran.r-project.org/web/packages/data.table/index.html) which seems to me to address many of the inefficiencies of data.frame. data.tables have no row names; and operations that only need data from one or two columns are (I believe) just as quick whether the total number of columns is 5 or 1000. This results in very quick operations (and, often, elegant code as well).

Regards
Timothee

On Mon, Jul 4, 2011 at 6:19 AM, ivo welch <ivo.welch_at_gmail.com> wrote:
> thank you, simon.  this was very interesting indeed.  I also now
> understand how far out of my depth I am here.
>
> fortunately, as an end user, obviously, *I* now know how to avoid the
> problem.  I particularly like the as.list() transformation and back to
> as.data.frame() to speed things up without loss of (much)
> functionality.
>
>
> more broadly, I view the avoidance of individual access through the
> use of apply and vector operations as a mixed "IQ test" and "knowledge
> test" (which I often fail).  However, even for the most clever, there
> are also situations where the KISS programming principle makes
> explicit loops still preferable.  Personally, I would have preferred
> it if R had, in its standard "statistical data set" data structure,
> foregone the row names feature in exchange for retaining fast direct
> access.  R could have reserved its current implementation "with row
> names but slow access" for a less common (possibly pseudo-inheriting)
> data structure.
>
>
> If end users commonly do iterations over a data frame, which I would
> guess to be the case, then the impression of R by (novice) end users
> could be greatly enhanced if the extreme penalties could be eliminated
> or at least flagged.  For example, I wonder if modest special internal
> code could store data frames internally and transparently as lists of
> vectors UNTIL a row name is assigned to.  Easier and uglier, a simple
> but specific warning message could be issued with a suggestion if
> there is an individual read/write into a data frame ("Warning: data
> frames are much slower than lists of vectors for individual element
> access").
>
>
> I would also suggest changing the "Introduction to R" 6.3  from "A
> data frame may for many purposes be regarded as a matrix with columns
> possibly of differing modes and attributes. It may be displayed in
> matrix form, and its rows and columns extracted using matrix indexing
> conventions." to "A data frame may for many purposes be regarded as a
> matrix with columns possibly of differing modes and attributes. It may
> be displayed in matrix form, and its rows and columns extracted using
> matrix indexing conventions.  However, data frames can be much slower
> than matrices or even lists of vectors (which, like data frames, can
> contain different types of columns) when individual elements need to
> be accessed."  Reading about it immediately upon introduction could
> flag the problem in a more visible manner.
>
>
> regards,
>
> /iaw
>
> ______________________________________________
> R-devel_at_r-project.org mailing list

> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 04 Jul 2011 - 14:44:28 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Jul 2011 - 17:50:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive