Re: [Rd] [R] data.frame() size

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Mon 12 Dec 2005 - 13:11:00 GMT

Hin-Tak Leung <hin-tak.leung@cimr.cam.ac.uk> writes:

> Prof Brian Ripley wrote:
> > Data frames have unique row names *by definition* (White Book p.57).
>
> Yes - I happened to have the White Book on my desk (not mine...)
> - indeed, the first sentence on page 57 is (quote verbatim, the
> "never" is in italic in the book, which I have added the "*" before
> and after):
>
> If all else fails, the row names are just the row numbers. They
> are *never* null and must be unique.
>
> So patching data.frame.R is quite wrong. However, the rowname/colname
> overhead is definitely an issue for processing of large data sets,
> both for speed and amount of memory consumed. So it is probably best
> to extend the data.frame class and call it something else instead,
> for those who needs to go that route.

Exactly. I recall from the Insightful people at the DSC in Seattle that something is going to happen with the rownames in S-PLUS or has happened in the latest release, but I don't remember exactly how they did it, and if and how it had to do with their "big dataframe" code. We might want R to follow suit in this respect.

Other options might include doing something about the string-storage of rownames, which is quite wasteful in R (every string is an R object, a string vector is really a list of CHARSXP objects). Either one could improve on the internal storage format, or one could allow rownames to be integers with semantics like "virtual strings" so that x["123",] still works.  

> (What I am doing is already called a different name so it isn't
> affected by this argument).
>
> Hin-Tak
>
>
>

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Dec 13 00:20:33 2005

This archive was generated by hypermail 2.1.8 : Mon 12 Dec 2005 - 16:21:34 GMT