Re: [R] efficiently replacing values in a matrix

From: Joerg van den Hoff <j.van_den_hoff_at_fzd.de>
Date: Thu, 17 Apr 2008 13:41:38 +0200

On Wed, Apr 16, 2008 at 03:56:26PM -0600, Matthew Keller wrote:
> Yes Chuck, you're right.
>

just a comment:

> Thanks for the help. It was a data.frame not a matrix (I had called
> as.matrix() in my script much earlier but that line of code didn't run
> because I misnamed the object!). My bad. Thanks for the help. And I'm
> VERY relieved R isn't that inefficient...

well, it _is_ at least when using data frames. and while it is obvious that operations on lists (data frames are lists in disguise, actually, right?) are slower than on arrays/matrices, I'm not happy with a performance drop by a factor of about seemlingy > 1500 (30 sec vs. > 13 h) -- and I have seen similar things even with rather small data sets, where the difference of using data frame vs. matrix might mean, e.g. overall run times of 10 sec. vs. 0.1 sec.

where is all this time burned? there _are_ functional languages which operate efficiently on lists.

I think these extreme performance drop when using an apparently innocent data structure is really bad. and it's bad, that it's not repeatedly stated in BIG LETTERS in the manuals: use matrices, at least for big arrays, whereever possible. this message is not at all tranferred by the "description" in data.frame manpage, e.g.:

"This function creates data frames, tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software."...

probably 90% (+ x) of all R users are simply that: users and not experts. when I started using R I exclusively used data frames for purely numerical data instead of matrices simply because I could get column n with x[n] instead of x[,n] and mean(x) worked columnwise (whereas apply(x, 2, 'mean') is tiresome) thus saving some typing. this is no strong reason in retrospect but probably quite common. and many then will stick with data.frames and endure long runtimes for now good reason at all.

another question would be whether homogeneous data frames could not internally be handled as matrices...

joerg

>
> Matt
>
>
> On Wed, Apr 16, 2008 at 3:39 PM, Rolf Turner <r.turner@auckland.ac.nz> wrote:
> >
> > On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> >
> > <snip>
> >
> >
> >
> > > I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll
> > not be surprised if the columns are factors.
> > >
> >
> > <snip>
> >
> > I suspect that you're right.
> >
> > ***Why*** can't people distinguish between data frames and matrices?
> > If they were the same <expletive deleted> thing, there wouldn't be two
> > different terms for them, would there?
> >
> > cheers,
> >
> > Rolf Turner
> >
> > ######################################################################
> > Attention:This e-mail message is privileged and confidential. If you are
> > not theintended recipient please delete the message and notify the
> > sender.Any views or opinions presented are solely those of the author.
> >
> >
> >
> > This e-mail has been scanned and cleared by
> > MailMarshalwww.marshalsoftware.com
> > ######################################################################
> >
>
>
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 17 Apr 2008 - 11:44:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 12:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive