From: jim holtman <jholtman_at_gmail.com>

Date: Sat 14 Oct 2006 - 02:14:27 GMT

Take a look with 'dput' and you will see the difference:

*> row.names(x) <- 1:n
**> dput(x)
*

structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(NA,
10), class = "data.frame")

*> row.names(x) <- 2:(n+1)
**> dput(x)
*

structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(2,
3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")

*>
*

'row.names' is different.

On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang@hp.com> wrote:

> Hi Gabor,

*>
**> * On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck (ggrothendieck@gmail.com) wrote:
**> > Try this:
**> >
**> > >class(attributes(x)$row.names)
**> > [1] "integer"
**> > >rownames(x) <- as.character(rownames(x))
**> > >class(attributes(x)$row.names)
**> > [1] "character"
**>
**> Yes, but this doesn't show that row.names was stored as a _single_
**> integer (3) instead of a vector of integers (1:3).
**>
**> Reading the changes again:
**>
**> The internal storage of row.names = 1:n just records 'n', for
**> efficiency with very long vectors.
**>
**> The "row.names" attribute must be a character or integer
**> vector, and this is now enforced by the C code.
**>
**> I think row.names is always _printed_ as a vector. I had misinterpreted the
**> help(row.names) paragraph in my original posting to mean that the internal
**> storage can be revealed by attributes(x, "row.names"). That paragraph implies
**> that attributes(x)$row.names and attr(x, "row.names") can have different
**> classes, but I can't create such an example.
**>
**> I did this experiment:
**>
**> > n <- 10000
**> > x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
**> > head(x)
**> V1 V2
**> 1 1 10001
**> 2 2 10002
**> 3 3 10003
**> 4 4 10004
**> 5 5 10005
**> 6 6 10006
**> > class(attributes(x)$row.names)
**> [1] "integer"
**> > save(x, file="x1", compress=FALSE)
**> > row.names(x) <- 2:(n+1)
**> > class(attributes(x)$row.names)
**> [1] "integer"
**> > save(x, file="x2", compress=FALSE)
**> > subset(file.info(c("x1", "x2")), select=size)
**> size
**> x1 80205
**> x2 120197
**>
**> The difference in size is about nrow(x) * 4 bytes. I think this shows that 1:n
**> was stored compactly as a single integer but 2:(n+1) was not.
**>
**> > On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang@hp.com> wrote:
**> > >Reading the list of changes for R version 2.4.0, I was happy to see that
**> > >the
**> > >row names of dataframes can be stored compactly (as the integer n when
**> > >row.names(df) is 1:n).
**> > >
**> > >help(row.names) contains this paragraph:
**> > >
**> > > Row names of the form '1:n' for 'n > 2' are stored internally in a
**> > > compact form, which might be seen by calling 'attributes' but never
**> > > via 'row.names' or 'attr(x, "row.names")'.
**> > >
**> > >I am unable to get attributes(x)$row.names to return just nrow(x). Am I
**> > >misreading the documentation? Does "might be seen" mean "possibly in some
**> > >future version of R" in this case?
**> > >
**> > >> (x <- as.data.frame(matrix(1:9, nrow=3)))
**> > > V1 V2 V3
**> > >1 1 4 7
**> > >2 2 5 8
**> > >3 3 6 9
**> > >> attributes(x)$row.names
**> > >[1] 1 2 3
**> > >> row.names(x) <- seq(len=nrow(x))
**> > >> attributes(x)$row.names
**> > >[1] 1 2 3
**> > >
**> > >Best,
**> > >Hsiu-Khuern.
**> > >
**>
**> Best,
**> Hsiu-Khuern.
**>
*

-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?

