Re: [Rd] Sweave driver extension

From: Kevin R. Coombes <kevin.r.coombes_at_gmail.com>
Date: Mon, 30 Jan 2012 12:07:59 -0600

I prefer the code chunks myself.

Function calls have overhead. In a bioinformatics world with large datasets and an R default that uses call-by-value rather than call-by-reference, the function calls may have a _lot_ of overhead. Writing the functions to make sure they use call-by-reference for the large objects instead has a different kind of overhead in the stress it puts on the writers and maintainers of code.

But then, I'm old enough to have looked at some of Knuth's source code for TeX and read his book on Literate Programming, where the ideas of "weave" and "tangle" were created for exactly the kind of application that Terry asked about. Knuth's fundamental idea here is that the documentation (mainly the stuff processed through "weave") is created for humans, while the executable code (in Knuth's view, the stuff created by "tangle") is intended for computers. If you want people to understand the code, then you often want to use a top-down approach that outlines the structure -- code chunks with forward references work perfectly for this purpose.

One of the difficulties in mapping Knuth's idea over to R and Sweave is that the operations of weave and tangle have gotten, well, tangled. Sweave does not just prepare the documentation; it also executes the code in order to put the results of the computation into the documentation. In order to get the forward references to work with Sweave, you would have to makes two passes through the file: one to make sure you know where each named chunk is and build a cross-reference table, and one to actually execute the code in the correct order. That would presumably also require a major rewrite of Sweave.

The solution I use is to cheat and hide the chunks initially and reveal them later to get the output that want. This comes down to combining eval, echo, keep.source, and expand in the right combinations. Something like:

%%%%%%%%
% set up a prologue that contains the code chunks. Do not evaluate or display them.
<<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
# do something sensible. If multiple steps, define them above here # using the same idea.
@
% also define the other code chunks here

\section{Start the First Section}

The \texttt{coxme} function is defined as follows:
<<coxme,keep.source=TRUE,expand=FALSE>>=

coxme <- function(formula, data, subset, blah blah  ){

<<coxme-check-arguments>>
<<coxme-build>>
<<coxme-compute>>
<<coxme-finish>>
}
@

Argument checking is important:
<<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
<<coxme-check-arguments>>=

@
% Describe the other chunks here

%%%%%%%%

     Kevin

On 1/24/2012 10:24 PM, Yihui Xie wrote:

> Maybe this is a my personal taste: I do not like pseudo R code in the
> form<<coxme-build>>  inside a chunk, and I'm curious about why you do
> not use real R functions to do the job.
>
> coxme<- function(formula, data, subset, blah blah  ){
>    coxme_check_arguments(...)
>    coxme_build(...)
>    coxme_compute(...)
>    coxme_finish(...)
> }
>
> You can define these coxme_xxx functions later in the parent
> environment. It is also easy for one function to call another, so the
> recursion is natural. Compared to text-processing tricks, I prefer
> well-defined functions.
>
> Your idea of using a named list to store R code is what I used in the
> knitr package (http://yihui.github.com/knitr/demo/reference/), e.g.
>
> % empty here
> <<chunk1, echo=TRUE>>=
> @
>
> % real code is defined here
> <<chunk1, echo=FALSE>>=
> rnorm(10)
> @
>
> The second chunk appears later, but when you weave the document, the
> code rnorm(10) will also go to the first chunk since the label
> 'chunk1' will index the code from the second chunk.
>
> Regards,
> Yihui
> --
> Yihui Xie<xieyihui_at_gmail.com>
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
>
>
>
> On Tue, Jan 24, 2012 at 1:50 PM, Terry Therneau<therneau_at_mayo.edu>  wrote:
>> Almost all of the coxme package and an increasing amount of the survival
>> package are now written in noweb, i.e., .Rnw files.  It would be nice to
>> process these using the Sweave function + a special driver, which I can
>> do using a modified version of Sweave.  The primary change is to allow
>> the following type of construction
>>
>> <<coxme>>
>> coxme<- function(formula, data, subset, blah blah  ){
>>    <<coxme-check-arguments>>
>>    <<coxme-build>>
>>    <<coxme-compute>>
>>    <<coxme-finish>>
>> }
>> @
>>
>> where the parts referred to come later, and will themselves be made up
>> of other parts.  Since the point of this file is to document source
>> code, the order in which chunks are defined is driven by "create a
>> textbook" thoughts and won't match the final code order for R.
>> The standard noweb driver only allows one level of recursion, and no
>> references to things defined further down in the file.
>>
>>   The primary change to the function simply breaks the main loop into
>> two parts: first read through the all the lines and create a list of
>> code chunks (some with names), then go through the list of chunks and
>> call driver routines.  There are a couple of other minor details, e.g. a
>> precheck for infinite recursions, but no change to what is passed to the
>> driver routines, nor to anything but the Sweave function itself.
>>
>> Primary question: who on the core team should I be holding this
>> conversation with?
>> Secondary: Testing level?  I have a few vignettes but not many.
>>     I'll need a "noweb" package anyway to contain the drivers -- should
>> we just duplicate the modified Sweave under another name?
>>     Call the package "noweb", "Rnoweb", ...?
>>
>> And before someone asks: Roxygen is a completely different animal and
>> doesn't address what I need.  I have latex equations just above the code
>> that impliments them, an annotated graph of the call tree next to the
>> section parsing a formula, etc. This is stuff that doesn't fit in
>> comment lines. The text/code ratio is>1.  On the other hand I've
>> thought very little about integration of manual pages and description
>> files with the code, issues which Roxygen addresses.
>>
>> Terry Therneau
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 30 Jan 2012 - 18:12:35 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 31 Jan 2012 - 12:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive