Re: [R] Why only a "" string for heading for row.names with write.csv with a matrix?

From: Tony Plate <tplate_at_acm.org>
Date: Thu 11 Aug 2005 - 03:42:08 EST

Here's a relatively easy way to get what I think you want. Note that converting x to a data frame before cbind'ing allows the type of the elements of x to be preserved:

 > x <- matrix(1:6, 2,3)
 > rownames(x) <- c("ID1", "ID2")
 > colnames(x) <- c("Attr1", "Attr2", "Attr3")
 > x
     Attr1 Attr2 Attr3
ID1     1     3     5
ID2     2     4     6

 > write.table(cbind(id=row.names(x), as.data.frame(x)), row.names=FALSE, sep=",")
"id","Attr1","Attr2","Attr3"
"ID1",1,3,5
"ID2",2,4,6

 >

As to why you can't get this via an argument to write.table (or write.csv), I suspect that part of the answer is a wish to avoid "creeping featuritis". Transferring data between programs is notoriously infuriating. There are more data formats than there are programs, but few programs use the same format as their default & preferred format. So to accommodate everyone's preferred format would require an extremely large number of features in the data import/export functions. Maintaining software that contains a large number of features is difficult -- it's easy for errors to creep in because there are so many combinations of how different features can be used on different functions.

The alternative to having lots of features on each function is to have a relatively small set of powerful functions that can be used to construct the behavior you want. This type of software is thought by many to be easier to maintain and extend. I think is is pretty much the preferred approach in R. The above one-liner for writing the data in the form you want is really not much more complex than using an additional argument to write.table(). (And if you need to do this kind of thing frequently, then it's easy in R to create your own wrapper function for 'write.table'.)

One might object to this line of explanation by noting that many functions already have many arguments and lots of features. I think the situation is that the original author of any particular function gets to decide what features the function will have, and after that there is considerable reluctance (justifiably) to add new features, especially in cases where there desired functionality can be easily achieved in other ways with existing functions.

Earl F. Glynn wrote:
> Consider:
>

>>x <- matrix(1:6, 2,3)
>>rownames(x) <- c("ID1", "ID2")
>>colnames(x) <- c("Attr1", "Attr2", "Attr3")

>
>
>>x

>
> Attr1 Attr2 Attr3
> ID1 1 3 5
> ID2 2 4 6
>
>
>>write.csv(x,file="x.csv")

>
> "","Attr1","Attr2","Attr3"
> "ID1",1,3,5
> "ID2",2,4,6
>
> Have I missed an easy way to get the "" string to be something meaningful?
>
> There is no information in the "" string. This column heading for the row
> names often could used as a database key, but the "" entry would need to be
> manually edited first. Why not provide a way to specify the string instead
> of putting "" as the heading for the rownames?
>
>>From http://finzi.psych.upenn.edu/R/doc/manual/R-data.html

>
> Header line
> R prefers the header line to have no entry for the row names,
> . . .
> Some other systems require a (possibly empty) entry for the row names,
> which is what write.table will provide if argument col.names = NA is
> specified. Excel is one such system.
>
> Why is an "empty" entry the only option here?
>
> A quick solution that comes to mind seems a bit kludgy:
>
>
>>y <- cbind(rownames(x), x)
>>colnames(y)[1] <- "ID"
>>y

>
> ID Attr1 Attr2 Attr3
> ID1 "ID1" "1" "3" "5"
> ID2 "ID2" "2" "4" "6"
>
>
>>write.table(y, row.names=F, col.names=T, sep=",", file="y.csv")

>
> "ID","Attr1","Attr2","Attr3"
> "ID1","1","3","5"
> "ID2","2","4","6"
>
> Now the rownames have an "ID" header, which could be used as a key in a
> database if desired without editing (but all the "numbers" are now
> characters strings, too).
>
> It's also not clear why I had to use write.table above, instead of
> write.csv:
>
>>write.csv(y, row.names=F, col.names=T, file="y.csv")

>
> Error in write.table(..., col.names = NA, sep = ",", qmethod = "double") :
> col.names = NA makes no sense when row.names = FALSE
>
> Thanks for any insight about this.
>
> efg
> --
> Earl F. Glynn
> Bioinformatics
> Stowers Institute
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Aug 11 03:47:58 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:13:06 EST