Re: [Rd] Source references from the parser

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Sun 26 Nov 2006 - 05:26:48 GMT

On 11/25/2006 11:00 PM, Deepayan Sarkar wrote:

> On 11/25/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:
>> On 11/25/2006 3:12 PM, Deepayan Sarkar wrote:
>>> On 11/25/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:

>>>> I have just committed some changes to R-devel (which will become R 2.5.0
>>>> next spring) to add source references to parsed R code. Here's a
>>>> description of the scheme:
>>>>
>>>> The design is done through 2 old-style classes.
>>>>
>>>> "srcfile" corresponds to a source file: it contains a filename, the
>>>> working directory in which that filename is to be interpreted, the last
>>>> modified timestamp of the file at the time the object is created, plus
>>>> some internal components. It is implemented as an environment so that
>>>> there can be multiple references to it.
>>>>
>>>> "srcref" is a reference to a particular range of characters (as the
>>>> parser sees them; I think that really means bytes, but I haven't tested
>>>> with MBCSs) in a source file. It is implemented as a vector of 4
>>>> integers (first line, first column, last line, last column), with the
>>>> srcfile as an attribute.
>>>>
>>>> The parser attaches a srcref attribute to each complete statement as it
>>>> gets parsed, if option("useSource") is TRUE. (I've left the old source
>>>> attribute in place as well for functions; I think it won't be needed in
>>>> the long run, but it is needed now.)
>>>>
>>>> When printing an object with a srcref attribute, print.default tries to
>>>> read the srcfile to obtain the text. If it fails, it falls back to an
>>>> ugly display of the reference. Using a new argument useSource=FALSE in
>>>> printing will stop this attempt: when printing language, it will
>>>> deparse; when printing a srcref, it will print the ugly fallback.
>>>>
>>>> source(echo=T) will echo all the lines of the file including comments
>>>> and formatting. demo() does the same, and I would guess Sweave will do
>>>> this too, but I haven't tested that yet. I think this will improve
>>>> Sweave output, but will need changes to the input file: people may have
>>>> comments there that they don't want shown. Some sort of
>>>> "useSource=FALSE" option will need to be added.

>>>>
>>>> The browser used with debug() etc. will display statements as they were
>>>> formatted in the original source. It will not display leading or
>>>> following comments, but will display embedded comments.

>>>>
>>>> Parsing errors display the name of the source file that was parsed, and
>>>> display verbose error messages describing what's wrong. This display
>>>> could still be improved, e.g. by displaying the whole source line with a
>>>> pointer to the error, instead of just the text up to the location of the
>>>> error.
>>>>
>>>> I plan to add some sort of equivalent of C "#line" directives, so that
>>>> preprocessed source files (e.g. the concatenated source that is
>>>> installed) can include references back to the original source files, for
>>>> syntax error reporting, and/or debugging. This will require
>>>> modification of the INSTALL process, but I haven't started on this yet.

>>>>
>>>> It would probably be a good idea to have some utility functions to play
>>>> with the srcref records for debugging and other purposes, but I haven't
>>>> written those yet. For example, the current source record on a function
>>>> could be replaced with a srcref, but only by expanding the srcref to
>>>> include some of the surrounding comments.

>>>>
>>>> Comments and problem reports are welcome.
>>> I haven't tested this, but the idea seems useful. Will this have any
>>> effect on code parsed using parse(text = "...")? Can it be extended to
>>> have some such effect? I ask because this is relevant in the context
>>> of Sweave, where I have always wanted the ability to retain the
>>> original formatting. I'm currently testing a patch that allows me to
>>> do this specifically for Sweave, but a more general solution is
>>> obviously preferable.
>> I've just added the capability to Sweave.  I haven't committed yet,
>> because I think it's important that authors can choose whether or not to
>> turn this on.  Could you let me know your typical workflow with Sweave,
>> whether you'd like this to default to on or off, and where you'd expect
>> to change the default?
> 
> I would like it as an option to the RweaveLatex driver (and perhaps
> others). In terms of changing the API, this is as simple as adding an
> argument to the 'RweaveLatexSetup' function.
> 
> In the case of my patch, the default is off, and is turned on by
> 
> <<...,src=TRUE>>
> ...
> @
> 
> To make this the global default, one can do
> 
> \SweaveOpts{src=TRUE}
> 
> etc. (the name 'src' is not necessarily the best, some variant of
> 'keep.source' might be more intuitive.)

This is now committed.

I used keep.source, exactly the same as the option() that controls this behaviour in other places.

I decided to set the default to TRUE. This means vignettes will all look different in R-devel. The simplest way to get the previous appearance is to put in

\SweaveOpts{keep.source=FALSE}

but in most cases I think people will want the new behaviour. It's only bad if the code was badly formatted or contained comments you don't want to show up in the final document. I looked through the grid package vignettes, and only saw about half a dozen places where I thought the formatting needed tweaking.

Duncan Murdoch



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Nov 26 16:30:05 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 28 Nov 2006 - 15:30:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.