RE: BUG in format()? (PR#383)

About this list Date view Thread view Subject view Author view Other groups

Subject: RE: BUG in format()? (PR#383)
Date: Wed 22 Dec 1999 - 23:32:54 EST

Message-Id: <>

Dear Prof. Brian Ripley,

Thank you for your quick reply. It might be that we have two problems here.

(1) BUG in
I must admit I was not aware that of the not-converting-to-factor
inconsistence here
(at the end of this mail more on this).

(2) BUG in
My problem was that REALLY CHANGES THE DATA which
It is totally legal to have a character column in a dataframe

> x <-'"', 3)))

as expected it is
> unclass(x)
[1] "\"" "\"" "\""
[1] "AsIs"

[1] "1" "2" "3"

here a first problem, well this is only printing
> x
1 \\\"
2 \\\"
3 \\\"

but now look at this
> x[1,1]
[1] "\""

> as.matrix(x)[1,1]
[1] "\\\""

this is not only a printing problem
> cat(x[1,1], "\n")

> cat(as.matrix(x)[1,1], "\n")

but DEFINITELY WRONG as can be seen in

> as.matrix(x)[1,1] == x[1,1]

It is caused, because makes use of format, and format
behaves as it does.
If, "that is what R-like languages do", then

(either) this convention about what format() does is not sub-optimal but
(or) format MUST NEVER be used within R routines EXCEPT FOR PRINTING,
         i.e. formatting something and storing it or returning it from a
function is dangerous.
         Even the use for cat() may be dangerous, as cat() as a side effect
may store data,
         as in write() or write.table(). This deserves a BIG warning in the
documentation of format.

Here is a list of functions of package:base making use of format

> collect <- character()
> for (i in ls("package:base")){
+ if ( any(grep("format[(]", deparse(get(i, pos="package:base"))))
+ || any(grep("format.default[(]", deparse(get(i,
+ )collect <- c(collect, i)
+ }
> collect
 [1] "add1.default" "add1.glm" "add1.lm"
 [5] "anovalist.lm" "" "drop1.default"
 [9] "drop1.lm" "format.char" "format.default"
[13] "" "hist.default" "legend"
[17] "print.aov" "print.aovlist" "print.coefmat"
[21] "print.glm" "print.glm.null" "print.htest"
[25] "print.mtable" "print.summary.glm" "print.summary.lm"
[29] "print.tables.aov" "print.ts" "quantile.default"
[33] "str.default" "summary.aov" ""
[37] "summary.infl" "symnum"

To my understanding, at least needs a fix.

Back to automatic conversion of characters to factors.

After fixing, the following comparisions will be TRUE
or FALSE, depending whether mat is a numeric matrix or a character matrix:

  all( unclass( == unclass(mat) )
  all( mat == sapply(, FUN=function(x)x) )

Obviously automatic conversion to factors is a design decision long ago, but
I am not convinced yet, however.
The need to maintain attribute "AsIs" just to grant that a basic data type
(character) remains unchanged, appears to be somewhat dangerous. So both,
character data and factors need maintaining, in EACH FUNCTION that might
work on dataframes. Uff! It is easy to predict that errors will happen:

Some systematic testing ...

> char <- letters[1:2]
> fac <- factor(char)
> dd <- data.frame(char=I(char), fac=fac)

reveals that

> dd[,"char"] <- char
> dd[,"char"]
[1] a b
Levels: a b

> dd$char <- char
> dd$char
[1] "a" "b"

So .Primitive("$<-") is inconsistent with automatically converting chars to
factors, as it allows to insert a pure character column into a dataframe,
which neither has attribute "AsIs" nor class "factor".

Handling I() is risky as well:

> mat <- matrix(letters, 2, 2)
> dimnames(mat) <- list(c(1:2), c("x","y"))
> mat
  x y
1 "a" "c"
2 "b" "d"

> dd <- data.frame(I(mat))
> ddd
  I.mat..x I.mat..y
1 a a
2 b b
3 c c

doesn't look too bad,
> dimnames(dd)
[1] "1" "2"

[1] "I.mat."

> str(ddd)
`data.frame': 3 obs. of 1 variable:
 $ I.mat.: chr [1:3, 1:2] "a" "b" "c" "a" "b" "c"
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "x" "y"
  ..- attr(*, "class")= chr "AsIs"

So this dataframe is no longer a simple list with each element representing
one column, and thus

> sapply(ddd, FUN=function(x)x)
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "a"
[5,] "b"
[6,] "c"

is no longer a matrix.

Back to "AsIs":

> ddd[,1]
     x y
[1,] "a" "a"
[2,] "b" "b"
[3,] "c" "c"
[1] "AsIs"

So here the whole matrix is "AsIs", and since matrix subscribting probably
doesn't maintain "AsIs"
> ddd[[1]][, 1]
[1] "a" "b" "c"

"AsIs" is gone.


Dr. Jens Oehlschlägel-Akiyoshi
Bayerstrasse 21

80335 München

Tel.: 089 545 28-27 Fax.: 089 545 28-10

Standard Disclaimers: Opinions expressed here are personal and are not otherwise represented.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b25 : Tue 04 Jan 2000 - 14:16:12 EST