Re: [Rd] read.table() and NULL for colClasses

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu 29 Jul 2004 - 06:12:56 EST


NULL is not a valid value for colClasses and I don't see why you thought it was. colClasses has to be character according to the documentation, so "NULL" is allowed but not NULL.

Your diff appears to be backwards for a patch. A patch against the current R-devel sources is what is needed, including some regression tests.

On Wed, 28 Jul 2004, Henrik Bengtsson wrote:

> Hi,
>
> is there are reason for not supporting NULL or "NULL" values for argument
> colClasses in read.table(), much like you can use NULL values for argument
> 'what' in scan()? This would help quite a bit when reading large data files
> where only a few columns are of interest.

Is that a common enough case to make this worth the code complication, given that scan() (or better, a DBMS) can be used? The usual reason is that R is maintained by a small and overworked team and adding complications needs justification, not not adding them.

> I've modfied read.table() to so it calls scan(what=...) also with NULLs for
> the fields to be skipped. Here's the diff of readtable.R (from the
> R-1.9.1.tgz; 9,591,217 bytes):
>
> diff readtable.new.R readtable.R
> 117,123d116
> < # Skip NULL columns in scan()
> < void <- sapply(colClasses, FUN=identical, "NULL") |
> < sapply(colClasses, FUN=is.null)
> < # If all (data) columns are NULL, return empty data frame.
> < if (sum(!void) <= 1*rlabp)
> < return(data.frame())
> < what[void] <- list(NULL)
> 131c124
> < nlines <- length(data[[which(!void)[1]]])
> ---
> > nlines <- length(data[[1]])
> 161c154
> < for (i in (1:cols)[!known & !void]) {
> ---
> > for (i in 1:cols) {
> 171,178d163
> < # Skipped row names equals row.names=NULL.
> < if (rlabp) {
> < if (void[1]) {
> < row.names <- NULL
> < data <- data[-1]
> < }
> < void <- void[-1]
> < }
> 201,202d185
> < # Remove NULL columns
> < data[void] <- NULL
>
> and a diff for read.table.Rd:
>
> diff read.table.new.Rd read.table.Rd
> 102,104c102
> < \code{NA} when \code{\link{type.convert}} is used. Columns for
> < which the value is \code{"NULL"} (or \code{NULL} in a list) are
> < skipped. NB: \code{as} is
> ---
> > \code{NA} when \code{\link{type.convert}} is used. NB: \code{as} is
> 181,183c179
> < the five atomic vector classes. Skipping columns with \code{"NULL"}
> < (or \code{NULL} will also require less memory.
> <
> ---
> > the five atomic vector classes.
>
> Note that there is already an, what I assume is unintentional, effect of
> setting a colClasses to "NULL". The data conversion, which happens *after*
> scan() has read the data anyway, "NULL" will NULL a column via as(x,
> "NULL"), but unfortunately the wrong column. If not the above modifications,
> maybe a warning for the latter?

That's not usage as documented so the effect is definitely unintentional. We can't catch all misuses!

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
Received on Thu Jul 29 06:17:35 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:45:04 EST