Re: [Rd] Sweave driver extension

From: Yihui Xie <xie_at_yihui.name>
Date: Mon, 30 Jan 2012 20:41:50 -0600

OK, I did not realize the overhead problem is so overwhelming in your situation. Therefore I re-implemented the chunk reference in the knitr package in another way. In Sweave we use

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
<<a>>
@

And in knitr, we can use real R code:

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
run_chunk('a')
@

This also allows arbitrary levels of recursion, e.g. I add another chunk called 'c':

<<c>=
run_chunk('b')
@

Because b uses a, so when c calls b, it will consequently call a as well.

The function run_chunk() will not bring overhead problems, because it simply extracts the code from other chunks and evaluates it here. It is not a functional call. This feature is still in the development version (well, I did it this afternoon): https://github.com/yihui/knitr.


Talking about Knuth's original idea, I do not know as much as you, but under knitr's design, you can arrange code freely, since the code is stored in a named list after the input document is parsed. You can define code before using it, or use it before defining it (later); it is indexed by the chunk label. Top-down or bottom-up, in whatever order you want. And you are right; it requires a major rewrite, and that is exactly what I tried to do. I appreciate your feedback because I know you have very rich experience in reproducible research.

Regards,
Yihui

--
Yihui Xie <xieyihui_at_gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
<kevin.r.coombes_at_gmail.com> wrote:

> I prefer the code chunks myself.
>
> Function calls have overhead. In a bioinformatics world with large datasets
> and an R default that uses call-by-value rather than call-by-reference, the
> function calls may have a _lot_ of overhead.  Writing the functions to make
> sure they use call-by-reference for the large objects instead has a
> different kind of overhead in the stress it puts on the writers and
> maintainers of code.
>
> But then, I'm old enough to have looked at some of Knuth's source code for
> TeX and read his book on Literate Programming, where the ideas of "weave"
> and "tangle" were created for exactly the kind of application that Terry
> asked about.  Knuth's fundamental idea here is that the documentation
> (mainly the stuff processed through "weave") is created for humans, while
> the executable code (in Knuth's view, the stuff created by "tangle") is
> intended for computers.  If you want people to understand the code, then you
> often want to use a top-down approach that outlines the structure -- code
> chunks with forward references work perfectly for this purpose.
>
> One of the difficulties in mapping Knuth's idea over to R and Sweave is that
> the operations of weave and tangle have gotten, well, tangled.  Sweave does
> not just prepare the documentation; it also executes the code in order to
> put the results of the computation into the documentation.  In order to get
> the forward references to work with Sweave, you would have to makes two
> passes through the file: one to make sure you know where each named chunk is
> and build a cross-reference table, and one to actually execute the code in
> the correct order.  That would presumably also require a major rewrite of
> Sweave.
>
> The solution I use is to cheat and hide the chunks initially and reveal them
> later to get the output that want. This comes down to combining eval, echo,
> keep.source, and expand in the right combinations. Something like:
>
> %%%%%%%%
> % set up a prologue that contains the code chunks. Do not evaluate or
> display them.
> <<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
> # do something sensible. If multiple steps, define them above here
> # using the same idea.
> @
> % also define the other code chunks here
>
> \section{Start the First Section}
>
> The \texttt{coxme} function is defined as follows:
> <<coxme,keep.source=TRUE,expand=FALSE>>=
>
> coxme <- function(formula, data, subset, blah blah  ){
> <<coxme-check-arguments>>
> <<coxme-build>>
> <<coxme-compute>>
> <<coxme-finish>>
> }
> @
>
> Argument checking is important:
> <<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
> <<coxme-check-arguments>>=
> @
> % Describe the other chunks here
>
> %%%%%%%%
>
>
>    Kevin
>
______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 31 Jan 2012 - 02:45:10 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 31 Jan 2012 - 17:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive