[Rd] speeding up perception

From: ivo welch <ivo.welch_at_gmail.com>
Date: Sat, 02 Jul 2011 11:23:01 -0700


Dear R developers: R is supposed to be slow for iterative calculations. actually, it isn't. matrix operations are fast. it is data frame operations that are slow.

R <- 1000
C <- 1000

example <- function(m) {
  cat("rows: "); cat(system.time( for (r in 1:R) m[r,20] &lt;-

sqrt(abs(m[r,20])) + rnorm(1) ), "\n")
  cat("columns: "); cat(system.time(for (c in 1:C) m[20,c] <-
sqrt(abs(m[20,c])) + rnorm(1)), "\n")

  if (is.data.frame(m)) { cat("df: columns as names: "); cat(system.time(for (c in 1:C) m[[c]][20] <- sqrt(abs(m[[c]][20])) + rnorm(1)), "\n") }
}

cat("\n**** Now as matrix\n")
example( matrix( rnorm(C*R), nrow=R ) )

cat("\n**** Now as data frame\n")
example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) )

When m is a data frame, the operation is about 300 times slower than when m is a matrix. The program is basically accessing 1000 numbers. When m is a data frame, the speed of R is about 20 accesses per seconds on a Mac Pro. This is pretty pathetic.

I do not know the R internals, so the following is pure speculation. I understand that an index calculation is faster than a vector lookup for arbitrary size objects, but it seems as if R relies on search to find its element. maybe there isn't even a basic vector lookup table.  a vector lookup table should be possible at least along the dimension of consecutive storage. another possible improvement would be to add an operation that adds an attribute to the data frame that contains a full index table to the object for quick lookup. (if the index table is there, it could be used. otherwise, R could simply use the existing internal mechanism.)

I think faster data frame access would significantly improve the impression that R makes on novices. just my 5 cents.

/iaw



Ivo Welch (ivo.welch_at_gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Professor of Finance
Anderson School at UCLA, C519

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 02 Jul 2011 - 18:25:22 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Jul 2011 - 12:20:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive