[Rd] write.dcf/read.dcf cycle converts missing entry to "NA" (PR#9796)

From: <bill_at_insightful.com>
Date: Tue, 17 Jul 2007 18:58:10 +0200 (CEST)


Full_Name: Bill Dunlap
Version: 2.5.0
OS: Red Hat Enterprise Linux WS release 3 (Taroon Update 6) Submission from: (NULL) (24.17.60.30)

If you read a dcf file with read.dcf(file,fields=c("Field",...)) and the file does not contain the desired field "Field", read.dcf puts a character NA for that entry in its output matrix. If you then call write.dcf, passing it the output of read.dcf(), it will write the entry "Field: NA". A subsequent read.dcf() on write.dcf's output file will then have a "NA", not a character NA, in the entry for "Field". I think that write.dcf() should not write lines in the output file where the input matrix contains a character NA.

Here is a test function to demonstrate the problem. It returns TRUE when a write.dcf/read.dcf cycle does not change the data.

  test.write.dcf <- function () {

     origFile <- tempfile()
     copyFile <- tempfile()
     on.exit(unlink(c(origFile, copyFile)))
     writeLines(c("Package: testA", "Version: 0.1-1", "Depends:", "",
                  "Package: testB", "Version: 2.1"  , "Suggests: testA", "",
                  "Package: testC", "Version: 1.3.1", ""),
                origFile)
     orig <- read.dcf(origFile,
                      fields=c("Package","Version","Depends","Suggests"))
     write.dcf(orig, copyFile, width = 72)
     copy <- read.dcf(copyFile,
                      fields=c("Package","Version","Depends","Suggests"))
     value <- all.equal(orig, copy)
     if (!identical(value, TRUE)) {
        attr(value, "orig") <- orig
        attr(value, "copy") <- copy
     }
     value

  }
Currently we get
  > test.write.dcf()
[1] "'is.NA' value mismatch: 0 in current 4 in target"
  attr(,"orig")
       Package Version Depends Suggests

[1,] "testA" "0.1-1" "" NA
[2,] "testB" "2.1" NA "testA"
[3,] "testC" "1.3.1" NA NA
attr(,"copy") Package Version Depends Suggests
[1,] "testA" "0.1-1" "" "NA"
[2,] "testB" "2.1" "NA" "testA"
[3,] "testC" "1.3.1" "NA" "NA"

With the attached write.dcf() it returns TRUE.

The diff would be
19,22c19,24

<     eor <- character(nr * nc)
<     eor[seq.int(1, nr - 1) * nc] <- "\n"
<     writeLines(paste(formatDL(rep.int(colnames(x), nr), c(t(x)),
<         style = "list", width = width, indent = indent), eor,
---
>     tx <- t(x)
>     not.na <- c(!is.na(tx))
>     eor <- character(sum(not.na))
>     eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n"
>     writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx),
>         style = "list", width = width, indent = indent)[not.na], eor,

and the entire function would be

`write.dcf` <-
function (x, file = "", append = FALSE, indent = 0.1 * getOption("width"),

    width = 0.9 * getOption("width"))
{

    if (!is.data.frame(x))

        x <- data.frame(x)
    x <- as.matrix(x)
    mode(x) <- "character"
    if (file == "")

        file <- stdout()
    else if (is.character(file)) {

        file <- file(file, ifelse(append, "a", "w"))
        on.exit(close(file))

    }
    if (!inherits(file, "connection"))

        stop("'file' must be a character string or connection")

    nr <- nrow(x)
    nc <- ncol(x)
    tx <- t(x)

    not.na <- c(!is.na(tx))
    eor <- character(sum(not.na))
    eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n"     writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx),
        style = "list", width = width, indent = indent)[not.na], eor,
        sep = ""), file)

}

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 17 Jul 2007 - 18:04:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 19 Jul 2007 - 00:36:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.