Re: [R] regex question

From: <markleeds_at_verizon.net>
Date: Mon, 03 Nov 2008 23:19:24 -0600 (CST)


Hi: Gabor's solution does do it in a single line. he just used paste to make the line. see below. John's is sort of a single line also but he called sub twice.
I doubt that it's possible to make it shorter than those solutions.

# Gabor's solution spelled out.

patReg1 <- "(^[ <*]+)"
patReg2 <- "([ > ]+$)"
temp <- paste(patReg1, patReg2, sep = "|") print(temp)

gsub(temp, "", varReg)

On Tue, Nov 4, 2008 at 12:10 AM, Ferry wrote:

> Dear John, Gabor ...
>
> Thank you for your fast responses.
> In term of efficiency, does my code efficient? I mean, I thought there
> is a way to combine both patterns into a single line.
>
> Also, I tried to substitute the pattern ([ <*]+) with ([[:punct:]]),
> as in R regex docs:
> patReg1 <- "(^[[:punct:]]+)"
>
> but it doesn't work.
>
> or, possibly it just my stupidity ?
>
> On Mon, Nov 3, 2008 at 5:59 PM, John Fox <jfox_at_mcmaster.ca> wrote:
>> Dear Ferry,
>>
>> You're almost all the way there. Just apply each substitution in
>> turn:
>>
>> varReg <- "* < <* this is my text > > "
>> left <- "(^[ <*]+)"
>> right <- "([ > ]+$)"
>> sub(right, "", sub(left, "", varReg))
>> [1] "this is my text"
>>
>> I hope this helps,
>> John
>>
>> ------------------------------
>> John Fox, Professor
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario, Canada
>> web: socserv.mcmaster.ca/jfox
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces_at_r-project.org

>>> [mailto:r-help-bounces_at_r-project.org]
>> On
>>> Behalf Of Ferry
>>> Sent: November-03-08 8:38 PM
>>> To: r-help_at_r-project.org
>>> Subject: [R] regex question
>>>
>>> hello,
>>>
>>> i am trying to extract text using regex as follows:
>>>
>>> "* < <* this is my text > > "
>>>
>>> into:
>>>
>>> "this is my text"
>>>
>>> below what I did:
>>>
>>> varReg <- "* < <* this is my text > > "
>>>
>>> ## either this pattern
>>> patReg <- "(^[ <*]+)"
>>> ## or below patten
>>> patReg <- "([ > ]+$)"
>>>
>>> sub(patReg, '', varReg)
>>>
>>> depending of which patten I use, I could only extra the first
>>> portion
>>> or the last portion of the unwanted characters. how to extract both
>>> ends and keep my text "this is my text" ?
>>>
>>> I have tried with gsub, as below:
>>> patReg <- "([ >* ]+)"
>>> gsub(patReg, '', varReg)
>>>
>>> but it returned "thisismytext"
>>>
>>> any idea is appreciated.
>>>
>>> thanks,
>>>
>>> ferry
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 04 Nov 2008 - 05:23:27 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 04 Nov 2008 - 09:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive