Re: [R] Is there a better way to parse strings than this?

From: Whit Armstrong <armstrong.whit_at_gmail.com>
Date: Thu, 14 Apr 2011 22:07:40 -0400

not everything has to be done in R.

awk and sed are some of the best tools on a linux/unix box.

quick refs:
http://www.pement.org/awk/awk1line.txt
http://sed.sourceforge.net/sed1line.txt

-Whit

On Wed, Apr 13, 2011 at 12:07 AM, Chris Howden <chris_at_trickysolutions.com.au> wrote:
> Hi Everyone,
>
>
> I needed to parse some strings recently.
>
> The code I've wound up using seems rather clunky, and I was wondering if
> anyone had any suggestions on a better way?
>
> Basically I do the following:
>
> 1) Use substr() to do the parsing
> 2) Use regexpr() to find the location of the string I want to parse on, I
> then pass this onto substr()
> 3) Use nchar() as the stop input to substr() where necessary
>
>
>
> I've got a simple example of the parsing code I used below. It takes
> questionnaire variable names that includes the question and the brand it
> was answered for and then parses it so the variable name and the brand are
> in separate columns. I then use this to restructure the data from
> unstacked to stacked, but that's another story.
>
>> # this is the data set
>> test
> [1] "A5.Brands.bought...Dulux"
> [2] "A5.Brands.bought...Haymes"
> [3] "A5.Brands.bought...Solver"
> [4] "A5.Brands.bought...Taubmans.or.Bristol"
> [5] "A5.Brands.bought...Wattyl"
> [6] "A5.Brands.bought...Other"
>
>> # Where do I want to parse?
>> break1 <-  regexpr('...',test, fixed=TRUE)
>> break1
> [1] 17 17 17 17 17 17
> attr(,"match.length")
> [1] 3 3 3 3 3 3
>
>> # Put Variable name in a variable
>> str1 <- substr(test,1,break1-1)
>> str1
> [1] "A5.Brands.bought" "A5.Brands.bought" "A5.Brands.bought"
> "A5.Brands.bought"
> [5] "A5.Brands.bought" "A5.Brands.bought"
>
>> # Put Brand name in a variable
>> str2 <- substr(test,break1+3, nchar(test))
>> str2
> [1] "Dulux"               "Haymes"              "Solver"
> [4] "Taubmans.or.Bristol" "Wattyl"              "Other"
>
>
>
> Thanks for any and all suggestions
>
>
> Chris Howden
> Founding Partner
> Tricky Solutions
> Tricky Solutions 4 Tricky Problems
> Evidence Based Strategic Development, IP Commercialisation and Innovation,
> Data Analysis, Modelling and Training
> (mobile) 0410 689 945
> (fax / office) (+618) 8952 7878
> chris_at_trickysolutions.com.au
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Apr 2011 - 02:10:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 15 Apr 2011 - 03:00:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive