Re: [Rd] R CMD build --resave-data

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Wed, 13 Apr 2011 14:45:37 +0200

>>>>> Hervé Pagès <hpages_at_fhcrc.org>
>>>>> on Tue, 12 Apr 2011 22:21:58 -0700 writes:

    > On 11-04-12 07:06 PM, Simon Urbanek wrote:
    >> 
    >> On Apr 12, 2011, at 8:53 PM, Hervé Pagès wrote:
    >> 
    >>> Hi Uwe,
    >>> 
    >>> On 11-04-11 08:13 AM, Uwe Ligges wrote:
    >>>> 
    >>>> 
    >>>> On 11.04.2011 02:47, Hervé Pagès wrote:

>>>>> Hi,
>>>>>
>>>>> More about the new --resave-data option
>>>>>
>>>>> As mentioned previously here
>>>>>
>>>>> https://stat.ethz.ch/pipermail/r-devel/2011-April/060511.html
>>>>>
>>>>> 'R CMD build' and 'R CMD INSTALL' handle this new option
>>>>> inconsistently. The former does --resave-data="gzip" by
>>>>> default. The latter doesn't seem to support the
>>>>> --resave-data= syntax: the --resave-data flag must either be
>>>>> present or not. And by default 'R CMD INSTALL' won't resave
>>>>> the data.
>>>>>
>>>>> Also, because now 'R CMD build' is resaving the data,
>>>>> shouldn't it reinstall the package in order to be able to do
>>>>> this correctly?
>>>>>
>>>>> Here is why. There is this new warning in 'R CMD check' that
>>>>> complains about files not of a type allowed in a 'data'
>>>>> directory:
>>>>>
>>>>>
>>>>> http://bioconductor.org/checkResults/2.8/bioc-LATEST/Icens/lamb1-checksrc.html
>>>>>
>>>>>
>>>>>
>>>>> The Icens package also has .R files under data/ with things
>>>>> like:
>>>>>
>>>>> bet<- matrix(scan("CMVdata", quiet=TRUE),nc=5,byr=TRUE)
>>>>>
>>>>> i.e. the R code needs to access some of the text files
>>>>> located in the data/ folder. So in order to get rid of this
>>>>> warning I tried to move those text files to inst/extdata/
>>>>> and I modified the code in the .R file so it does:
>>>>>
>>>>> CMVdata_filepath<- system.file("extdata", "CMVdata",
>>>>> package="Icens") bet<- matrix(scan(CMVdata_filepath,
>>>>> quiet=TRUE),nc=5,byr=TRUE)
>>>>>
>>>>> But now 'R CMD build' fails to resave the data because the
>>>>> package was not installed first and the CMVdata file could
>>>>> not be found.
>>>>>
>>>>> Unfortunately, for a lot of people that means that the safe
>>>>> way to build a source tarball now is with
>>>>>
>>>>> R CMD build --keep-empty-dirs --no-resave-data
    >>>> 
    >>>> 
    >>>> Hervé,
    >>>> 
    >>>> actually is makes some sense to have these defaults from a
    >>>> CRAN maintainer's point of view:
    >>>> 
    >>>> --keep-empty-dirs: we found many packages containing empty
    >>>> dirs unnecessarily and the idea is to exclude them at the
    >>>> build state rather than at the later installation stage. Note
    >>>> that the package maintainer is supposed to run build (and
    >>>> knows if the empty dirs are to be included, the user who runs
    >>>> INSTALL does not).
    >>>> 
    >>>> --no-resave-data: We found many packages with unsufficiently
    >>>> compressed data. This should be fixed when building the
    >>>> package, not later when installing it, since the reduces size
    >>>> is useful in the source tarball already.
    >>>> 
    >>>> So it does make some sense to have different defaults in
    >>>> build as opposed to INSTALL from my point of view (although I
    >>>> could live with different, tough).
    >>> 
    >>> If you deliberately ignore the fact that 'R CMD INSTALL' is
    >>> also used by developers to install from the *package source
    >>> tree* (by opposition to end users who use it to install from a
    >>> *source tarball*,
    >> 
    >> .. for a good reason, IMHO no serious developer would do that
    >> for obvious reasons -

    > This sounds like saying that no serious developer working on a
    > big project involving a lot of files to compile should use
    > 'make'.  I mean, serious developers like you *always* do 'make
    > clean' before they do 'make' on the R tree when they need to
    > test a change, even a small one? And this only takes a "fraction
    > of second" for them? Hey, I'd love to be able to do that too!     > ;-)

    > H.

    >> you'd be working on a dirty copy creating many unnecessary
    >> problems and polluting your sources. The first time you'll
    >> spend an hour chasing a non-existent problem due to stale
    >> binary objects in your tree you'll learn that lesson ;). The
    >> fraction of a second spent in R CMD build is well worth the
    >> hours saved. IMHO the only valid reason to run INSTALL on a
    >> (freshly unpacked tar ball) directory is to capture config.log.
    >> 
    >> Cheers, Simon
    >> 
    >> 
    >> 
    >>> even though they don't use it directly), then you have a
    >>> point. So maybe I should have been more explicit about the
    >>> problem that it can be for the *developer* to have 'R CMD
    >>> build' and 'R CMD INSTALL' behave differently by default.
    >>> 
    >>> Of course I'm not suggesting that 'R CMD INSTALL' should
    >>> behave differently (by default) depending on whether it's used
    >>> on a source tarball (mode 1) or a package source tree (mode
    >>> 2).
    >>> 
    >>> I'm suggesting that, by default, the 3 commands (R CMD build +
    >>> R CMD INSTALL in mode 1 and 2) behave consistently.
    >>> 
    >>> With the latest changes, and by default, 'R CMD INSTALL' is
    >>> still doing the right thing, but not 'R CMD build' anymore.
    >>> 
    >>> I perfectly understand the intention behind those new flags,
    >>> which is to try to "optimize" the resulting source tarball but
    >>> what would you think if 'gcc' had some optimization flags that
    >>> can generate broken executables (under some circumstances) and
    >>> if these flags were enabled by default?
    >>> 
    >>> Note that I would have no problem with 'R CMD build' trying to
    >>> resave the data by default if the current implementation of
    >>> that feature was working properly, but unfortunately it's
    >>> broken (see my previous email for the details).
    >>> 
    >>> Thanks, H.
    >>> 
    >>>> 
    >>>> If you need further arguments for the discussion: I also tend to use
    >>>> --no-vignettes nowadays if my code does not change considerably. ;-)
    >>>> 
    >>>> Best wishes,
    >>>> Uwe
    >>>> 
    >>>> 
    >>>> 

>>>>> I hope the list of options/flags that we need to use to "fix" 'R CMD
>>>>> build' (and make it consistent with R CMD INSTALL) is not going to
>>>>> grow too much ;-)
;-)

I'm with Herve here.
I almost always use R CMD INSTALL on a directory rather than a tarball... though most of the time the directory is freshly untarred.
Other times, however one of the reasons is exactly that I can keep things around (*.o, ...) which are only rebuilt very rarely.

Martin



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 13 Apr 2011 - 12:49:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 13 Apr 2011 - 13:50:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive