[Rd] Problems with checking documentation vs data, and a proposal

From: Ross Boylan <ross_at_biostat.ucsf.edu>
Date: Tue 16 Jan 2007 - 22:03:00 GMT


I have a single data file inputs.RData that contains 3 objects. I generated an Rd page for each object using prompt(). When I run R CMD check I get
* checking for code/documentation mismatches ... WARNING Warning in utils::data(list = al, envir = data_env) :

         data set 'gold' not found
(gold is one of the objects).

This appears to be coming from the codocData function defined in src/library/tools/R/QC.R (this is in the Debianised source 2.4.1, so the path might be a little different).

According to the help on this function, it will only attempt a match when there is a single alias in the documentation file, although I'm not sure that's what the code does (it seems to check only if there is more than one format section). At any rate, the central logic appears to gather up names of data objects and then to load them with

            ## Try loading the data set into data_env.
            utils::data(list = al, envir = data_env)
            if(exists(al, envir = data_env, mode = "list",
                      inherits = FALSE)) {
                al <- get(al, envir = data_env, mode = "list")
            }

Since there is no gold.RData, this is failing.

This leads to 2 issues: what should I do now, and how might this work better in the future.

Taking the future first, how about having the code first load all the data files that it finds somewhere near the beginning? If it did so, the code

        ## Try finding the variable or data set given by the alias.
        al <- aliases[i]
        if(exists(al, envir = code_env, mode = "list",
                  inherits = FALSE)) {

which precedes the earlier snippet, would find the symbol was defined and be happy. I suppose the data could be loaded into code_env, although using it seems to risk deciding that a data symbol is defined when the symbol refers to a code object.

I'm not sure if attempting to load the data objects individually should still be attempted under this scenario, if the symbol is not already present.

What can I do in the short run, particularly since I would like to have the code pass R CMD check with versions of R that don't include this possible enhancement, what can I do? I see several options, none of them beautiful:
1) Delete inputs.RData and create 3 separate data objects. However, I have code that relies on inputs being present, and the 3 data items go together naturally.
2) Make a single document describing inputs.RData. First problem: the page would be awkward combining all 3 things. Second, it looks as if codocData might still try loading the individual data objects, since it tries to pull data names out of the documentation, even out of individual item inside \describe.
3) Attempt to disable the checks by adding multiple aliases or something else to be revealed by closer inspection of the code. This is a hack that bypasses the checking altogether (unless it turns out I still get a complaint about missing documentation).
4) Create gold.RData and others as symlinks to inputs.RData. Fragile across operating systems, version control systems, and versions of tar. Might get errors about multiple data definitions.

Usual caveats: this is all based on my imperfect understanding of the code.

So, any comments on the possible modification to codocData or the work-arounds?

-- 
Ross Boylan                                      wk:  (415) 514-8146
185 Berry St #5700                               ross@biostat.ucsf.edu
Dept of Epidemiology and Biostatistics           fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739                     hm:  (415) 550-1062

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jan 17 09:05:31 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 16 Jan 2007 - 23:31:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.