Re: [Rd] Argument recycling in substring()

From: William Dunlap <>
Date: Fri, 04 Jun 2010 08:56:58 -0700

> -----Original Message-----
> From:
> [] On Behalf Of Martin Maechler
> Sent: Friday, June 04, 2010 2:46 AM
> To: Hervé Pagès
> Cc:
> Subject: Re: [Rd] Argument recycling in substring()
> >>>>> "HP" == Hervé Pagès <>
> >>>>> on Thu, 03 Jun 2010 17:53:33 -0700 writes:
> HP> Hi,
> HP> According to its man page substring() "expands (its) arguments
> HP> cyclically to the length of the longest _provided_ none are of
> HP> zero length".
> HP> So, as expected, I get an error here:
> >> substring("abcd", first=2L, last=integer(0))
> HP> Error in substring("abcd", first = 2L, last = integer(0)) :
> HP> invalid substring argument(s)
> HP> But I don't get one here:
> >> substring(character(0), first=1:2, last=3L)
> HP> character(0)
> HP> which is unexpected. according to the docu.
> My gut feeling would say that the documentation should be
> updated in this case, rather than the implementation.
> RFC! other opinions?

I think it would be nice if multiargument vectorized functions in core R used the rules that are used by the arithmetic functions (`+`, etc.):

  1. if any argument length is 0, then the output length is 0
  2. otherwise the output is the length of the longest input The arithmetic functions also warn if the output length is not a multiple of some input length. (They actually warn 'longer ... length is not a multiple of shorter ...' and I'm extrapolating that to more than two arguments.) Most other multi-vectorized functions (e.g., log, pnorm) don't currently warn.

If they all followed the same rules then it would be easier to write code involving unfamiliar functions. The rule could be stated in one help file and a help file for a given function could say that arguments x, y, and z, but not a or b, are 'vectorized', with a link to the one help file describing vectorization. Even better, the C and C++ API's could be expanded to do the standard multivectorization so not every function would do it in its own way.

Some functions cannot be changed to follow that rule because it would break too much code (e.g., paste() and cat()). However, why shouldn't substring return character(0) if any argument is 0 long?

By the way, the 'zero rule' is there so we don't have to write so many if(length(x)>0) statements around things like

    which(x) + 1

    substring(x, 1, nchar(x)-1)
where the scalar 1 would otherwise cause NA's to arise.

[Perhaps I should not state my opinion so forcibly, since. for legal reasons, I'm not in a position to change core R code.]

Bill Dunlap
Spotfire, TIBCO Software

> HP> Otherwise, yes substring() will recycle its arguments to the
> HP> length of the longest:
> >> substring("abcd", first=1:3, last=4:3)
> HP> [1] "abcd" "bc" "cd"
> HP> Cheers,
> HP> H.
> HP> --
> HP> Hervé Pagès
> HP> Program in Computational Biology
> HP> Division of Public Health Sciences
> HP> Fred Hutchinson Cancer Research Center
> HP> 1100 Fairview Ave. N, M2-B876
> HP> P.O. Box 19024
> HP> Seattle, WA 98109-1024
> HP> E-mail:
> HP> Phone: (206) 667-5791
> HP> Fax: (206) 667-1319
> HP> ______________________________________________
> HP> mailing list
> HP>
> ______________________________________________
> mailing list
> mailing list Received on Fri 04 Jun 2010 - 16:00:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 04 Jun 2010 - 21:11:02 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive