[Rd] A couple of issues with colClasses/setAs

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed 08 Sep 2004 - 08:34:23 EST

Consider this:

$ cat test.dat
1 a
2 b

Now, we want to read the 2nd column as a factor and ignore the first (since it's just a sequential ID). We can't just put "factor" among the colClasses (would have been nice), so let's try this instead

> setAs("character","factor",as.factor)
Arguments in definition changed from (x) to (from)
> read.table("test.dat",colClasses=c("numeric","factor"))
Error in inherits(x, "factor") : Object "x" not found

which is a bit peculiar: Why does it change the argument when that's going to create a function that doesn't work?? You do need to spell it out:

> setAs("character","factor",function(from)as.factor(from))

And now we get somewhere

> read.table("test.dat",colClasses=c("numeric","factor"))
  V1 V2
1 1 a
2 2 b

but suppose we want to get rid of col.1:

> read.table("test.dat",colClasses=c("NULL","factor"))
Error in data[[i]] : subscript out of bounds

which looks like a pretty clear bug. In contrast, this works fine

> read.table("test.dat",colClasses=c("NULL","character"))
1 a
2 b

so the issue only arises when you have nontrivial coercions.

Presumably, the issue is that the colClasses in those cases miscalculate indices by forgetting the columns that were skipped.

   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

R-devel@stat.math.ethz.ch mailing list
Received on Wed Sep 08 08:38:04 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:00:06 EST