From: Robert Stojnic <rainmansr_at_gmail.com>

Date: Sun, 03 Jul 2011 13:13:03 +0100

}

}

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 03 Jul 2011 - 13:31:43 GMT

Date: Sun, 03 Jul 2011 13:13:03 +0100

On 03/07/11 05:30, Simon Urbanek wrote:

> This is just a quick, incomplete response, but the main misconception is really the use of data.frames. If you don't use the elaborate mechanics of data frames that involve the management of row names, then they are definitely the wrong tool to use, because most of the overhead is exactly to manage to row names and you pay a substantial penalty for that. Just drop that one feature and you get timings similar to a matrix:

I tried to find some documentation on why there needs to be extra row names handling when one is just assigning values into the column of a data frame, but couldn't find any. For a while I stared at the code of `[<-.data.frame` but couldn't figure out it myself. Can you please summarise what exactly is going one when one does m[1, 1] <- 1 where m is a data frame?

I found that the performance is significantly different with different number of columns. For instance

# reassign first column to 1

example <- function(m){

for(i in 1:1000) m[i,1] <- 1

}

m <- as.data.frame(matrix(0, ncol=2, nrow=1000)) system.time( example(m) )

user system elapsed

0.164 0.000 0.163

m <- as.data.frame(matrix(0, ncol=1000, nrow=1000)) system.time( example(m) )

user system elapsed

34.634 0.004 34.765

When m is a matrix, both run well under 0.1s.

Increasing the number of rows (but not the number of iterations) leads to some increase in time, but not as drastic when increasing column number. Using m[[y]][x] in this case doesn't help either.

for(i in 1:1000) m[[1]][i] <- 1

}

m <- as.data.frame(matrix(0, ncol=1000, nrow=1000)) system.time( example2(m) )

user system elapsed

36.007 0.148 36.233

r.

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 03 Jul 2011 - 13:31:43 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

*
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Jul 2011 - 05:30:05 GMT.
*

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*