Re: [Rd] R CMD check: non source files in src on (2.3.0 RC (2006-04-19 r37860))

From: Robert Gentleman <rgentlem_at_fhcrc.org>
Date: Fri 21 Apr 2006 - 14:44:16 GMT

Kurt Hornik wrote:

>>>>>>Simon Urbanek writes:

>
>
>>On Apr 20, 2006, at 1:23 PM, Henrik Bengtsson (max 7Mb) wrote:
>>
>>>Is it a general consensus on R-devel that *.tar.gz distributions  
>>>should only be treated as a distribution for *building* packages  
>>>and not for developing them?

>
>
> [Actually, distributing so that they can be installed and used.]
>
>
>>I don't know whether this is a general consensus, but it definitely  
>>an important distinction. Some authors put their own Makefiles in src  
>>although they are not needed and in fact harmful, preventing the  
>>package to build on other systems - only because they are too lazy to  
>>use R building mechanism for development and don't make the above  
>>distinction.

>
>
> Right :-)
>
> Henrik, as I think I mentioned the last time you asked about this: of
> course you can basically do everything you want. But it comes at a
> price. For external sources, you need to write a Makefile of your own,
> so as to make it clear that you provide a mechanism which is different
> from the standard one. And, as Simon said, the gain in flexibility
> comes at a price.
>
> Personally and as one of the CRAN maintainers, I'd be very unhappy if
> package maintainers would start flooding their source .tar.gz packages
> with full development environment material. (I am also rather unhappy
> about shipping large data sets which are only used for instructional
> purposes [rather than providing the data set "on its own"].) It is
> simply not true that bandwidth does not matter.

   I can see the problem with large packages, but the current system does nothing about that AFAIC. And as Simon indicated, his biggest problem is the one set of files that we are allowed - so the argument is that the current approach is neither necessary nor sufficient and it imposes a structure on people that seems to be unneccearily restrictive. I don't see how excluding README (or any thing else that a package maintainer has put there) makes life better, but maybe I am missing something here. These are precisely the sorts of things that have helped me to figure out what was intended when it didn't work. So this approach is regressive, IMHO.

  If the size is not large, who cares what is in a package, and things releated to source should be in src. I see that a similar approach is being taken with the R directory (and probably other directories). This is, in my opinion, unfortunate, imposing restrictions that don't solve the problem mentioned in some general way are not useful.

  For BioC, we manually check the size etc and ask people to reduce and remove. You could easily do the same at CRAN (and even automate it). BioC packages can be enormous relative to those on CRAN and I don't think we have ever had a serious complaint about it. But then the data sets tend to be large, so maybe people are just more forgiving.

  As for the difference between source packages and built packages, yes it would be nice at some time to enter into a discussion on that topic. There are lots of things that can be done at build time (that are not currently being done) that would speed up package installation etc. But they come at the price that Henrik has mentioned. The built package is no longer suitable for development. And hence we may usefully consider another format (something between source and binary, .Rgz?)

  best wishes
    Robert

>
> If there is need, we could start having developer-package repositories.
> However, I'd prefer a different approach. We're currently in the
> process of updating the CRAN server infrastructure, and should be able
> to start deploying an R-forge project hosting service "eventually"
> (hopefully, we can set things up during the summer). This should
> provide us with an ideal infrastructure for sharing developer resources,
> in particular as we could add QC testing et al to the standard community
> services.
>
> Best
> -k
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem@fhcrc.org

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat Apr 22 00:57:36 2006

This archive was generated by hypermail 2.1.8 : Fri 21 Apr 2006 - 18:18:51 GMT