[Rd] S3/S4 classes performance comparison

From: Eric Lecoutre <lecoutre_at_stat.ucl.ac.be>
Date: Sat 15 Jan 2005 - 00:40:02 EST

Hi R-devel,

If you did read my survey on Rhelp about reporting, you may have seen that I am implementing a way to handle outputs for R (mainly target output destinations: xHTML and TeX).
In fact: I does have something that works for basic objects, entirely done with S4 classes, with the results visible at: http://www.stat.ucl.ac.be/ROMA/sample.htm http://www.stat.ucl.ac.be/ROMA/sample.pdf

To achieve this goal, I do use intermediary objects that would reprensent the structure of the output. Thus I defined classes for Vector, Tables, Rows, Cells, Sections, and so on. Most of those structure are recursive. Then, at a firts attemps, a matrix would be represented as a Table containing Rows containg Cells containing Vectors, which finally is easy to export and which makes easy the customisation (if you need to insert a footnote within a cell for example).
I know that this intermediary layout would be far more easier to handle at C level, but I dont have any C skill for that...

One of my problem is that this consumes a lot of memory/computation time. Too much, indeed...
20 sec. to export data(iris) on my PIV 3.2 Ghz 1Go RAM, which is not acceptable.

I was intending to do start properly, as starting from scratch new code. I did write everything using S4 classes.
Doing a simple test reveals crucial efficiency differences between S3 and S4 classes.

Here is the test:

---

### S3 CLASSES

S3content <- function(obj=NULL,add1=NULL,add2=NULL,type="",...){
         out <- list(content=obj,add1=add2,add2=add2,type=type)
         class(out) <- "S3Content"
         return(out)
}

S3vector <- function(vec,...){
   out <- S3content(obj=vec,type="Vector",...)
   class(out) <- "S3Vector"
   return(out)
}


### S4 classes

setClass("S4content",representation(content="ANY",add1="ANY",add2="ANY",type="character"))

S4content <- function(obj=NULL,add1=NULL,add2=NULL,type="",...){
   new("S4content",content=obj,add1=add1,add2=add2,type=type)
}

S4vector <- function(vec,...){
   new("S4content",type="vector",content=vec,...)
}

### Now the test

> test <- rnorm(10000)
> gc()
used (Mb) gc trigger (Mb) Ncells 169135 4.6 531268 14.2 Vcells 75260 0.6 786432 6.0
> (system.time(lapply(test,S3vector)))
[1] 0.17 0.00 0.19 NA NA
> gc()
used (Mb) gc trigger (Mb) Ncells 169136 4.6 531268 14.2 Vcells 75266 0.6 786432 6.0
> (system.time(lapply(test,S4vector)))
[1] 15.08 0.00 15.13 NA NA ----- There is here a factor higher than 80! Is there something trivial I did overlook? Is this 80 factor normal? Is it still recommended (recommendable...) to use S4 classes when considered that? Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre@stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Jan 14 23:55:02 2005

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:02:35 EST