[Rd] Multi-line string constants: proposed patch

From: Kevin Wright <kwright_at_eskimo.com>
Date: Sat 11 Sep 2004 - 06:30:21 EST

R 1.9.1 requires multi-line strings to contain a backslash at the end of each line (except the last line). As noted by Mark Bravington (http://tolstoy.newcastle.edu.au/R/help/02b/5199.html) this requirement appears to be undocumented.

In S-Plus 6.2, multi-line strings do not need a backslash for continuation.

I recently (http://tolstoy.newcastle.edu.au/R/devel/04b/0256.html) requested compatability with S-Plus and was told to contribute a patch and then it would be considered. Here is the proposed patch.   

In the files src/main/gram.y and src/main/gram.c strings are parsed with the StringValue function. Looking at the function it is clear that a newline character (not the two-byte '\n') generates an error:

static int StringValue(int c)

	if (c == '\n') {
	    return ERROR;


I tracked this code down and Mark Bravington confirmed (by building r-devel on Windows) that commenting out the four lines that start with   if (c == '\n')
will allow R to handle multi-line strings either with or without backslashes for continuation. A 'diff' appears at the end of this mesage.

Note that if EOF is encountered while R thinks it's reading a string, it will silently add the string terminator rather than causing an error. I can't really see this as undesirable but I suppose we should mention it. (Currently the same thing happens if the last character of the last line is a backslash, so it's "consistent" anyway.)

I've searched through S-Plus and R documentation. Here are few relevant texts:

S-Plus 6.2 Programmer's Guide, Page 11
"character strings [are] enclosed by double quotes or apostrophes"

S-Plus 6.2 Programmer's Guide, Page 947 (abbreviated) "Strings consist of zero or more characters typed between two apostrophes or double quotes. Table 23.2 lists some special characters for use in string literals. These special characters are for string control, obtaining characters that are not represented on the keyboard, or delimiting character strings.

  \t tab
  \\ backslash
  \n newline

R Language Definition
String constants are delimited by a pair of single (') or double (") quotes and can contain all other printable characters. Quotes and other special characters within strings are specified using escape sequences.

Here are some simple examples:

f1 <- function(){

  # This function generates a warning in S-Plus that "the initial
  # backslash is ignored", but then is read in as intended (two lines).
  # In R 1.9.0 this becomes one-line: "function 1 text"
  l1 <- "function \
1 text"

f2 <- function(){
  # This fails in R 1.9.0. Works fine in S-Plus 6.2   l2 <- "function
2 text"

f3 <- function(){
  # Identical in R and S-Plus
  l3 <- "function \n3 text"

Mark Bravington supllied the diff and writes:

I've now gotten R-devel to build, and your patch works fine. I just commented out the code rather than deleting it, though the R team might want to do that differently. I renamed 'gram.c' to 'ogram.c' and ran 'diff'-- here's the output (yes it is trivial):

C:\R\R-devel\src\main>diff ogram.c gram.c 3122,3125c3122,3125

<       if (c == '\n') {
<           xxungetc(c);
<           return ERROR;
<       }
//    if (c == '\n') {
//        xxungetc(c);
//        return ERROR;
//    }

Presumably the same change should be made in gram.y

Thanks for considering this patch.

Kevin Wright ( & Mark Bravington )

R-devel@stat.math.ethz.ch mailing list
Received on Sat Sep 11 06:34:49 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:00:15 EST