Re: [R] strsplit, keeping delimiters

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Sat, 14 Jun 2008 13:06:14 -0400

On Sat, Jun 14, 2008 at 11:46 AM, hadley wickham <h.wickham_at_gmail.com> wrote:
> On Sat, Jun 14, 2008 at 10:20 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote:

>> "hadley wickham" <h.wickham_at_gmail.com> writes:
>> n
>>> On Sat, Jun 14, 2008 at 12:55 AM, Gabor Grothendieck
>>> <ggrothendieck_at_gmail.com> wrote:
>>>> Try this:
>>>>
>>>>> library(gsubfn)
>>>>> x <- "A: 123 B: 456 C: 678"
>>>>> strapply(x, "[^ :]+[ :]|[^ :]+$")
>>>> [[1]]
>>>> [1] "A:"   "123 " "B:"   "456 " "C:"   "678"
>>
>

> Either way is fine, since I'll be stripping off the spaces later anyway.
>

Note that if you intend to strip off the delimiters anyways but still want them to examine them you might want to make use of the other arguments of strapply too:

> x <- "AC: 123 BDEF: 456 CADSDFSDFSF: 6sdf:78"

> strapply(x, "([^ :]+)([ :]|$)", ~ c(...), b= -2)

[[1]]
 [1] "AC"          ":"           "123"         " "           "BDEF"
 [6] ":"           "456"         " "           "CADSDFSDFSF" ":"
[11] "6sdf"        ":"           "78"          ""

That returns the match followed by the delimiter as separate strings which can be reshaped into an n x 2 matrix.

Or, all in one strapply:

> strapply(x, "([^ :]+)([ :]|$)", FUN = ~ c(...), b= -2, simplify = ~ matrix(x, nc = 2, byrow = TRUE))

     [,1]          [,2]
[1,] "AC"          ":"
[2,] "123"         " "
[3,] "BDEF"        ":"
[4,] "456"         " "
[5,] "CADSDFSDFSF" ":"
[6,] "6sdf"        ":"
[7,] "78"          ""

Here b is short for backref and b = -2 says pass only the 2 back references (minus means only) to FUN. It then applies the function whose body is given by the formula, FUN, and simplifies the result using the function whose body is given by the formula, simlify. It uses the free variables in the two formulae (... in the first case and x in the second case) to construct the formal arguments of these functions.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 17:08:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 18:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive