[R] Is there a better way to parse strings than this?

From: Chris Howden <chris_at_trickysolutions.com.au>
Date: Wed, 13 Apr 2011 14:07:21 +1000

Hi Everyone,

I needed to parse some strings recently.

The code I've wound up using seems rather clunky, and I was wondering if anyone had any suggestions on a better way?

Basically I do the following:

  1. Use substr() to do the parsing
  2. Use regexpr() to find the location of the string I want to parse on, I then pass this onto substr()
  3. Use nchar() as the stop input to substr() where necessary

I've got a simple example of the parsing code I used below. It takes questionnaire variable names that includes the question and the brand it was answered for and then parses it so the variable name and the brand are in separate columns. I then use this to restructure the data from unstacked to stacked, but that's another story.

> # this is the data set
> test
[1] "A5.Brands.bought...Dulux"
[2] "A5.Brands.bought...Haymes"
[3] "A5.Brands.bought...Solver"
[4] "A5.Brands.bought...Taubmans.or.Bristol"
[5] "A5.Brands.bought...Wattyl"
[6] "A5.Brands.bought...Other"

> # Where do I want to parse?
> break1 <-  regexpr('...',test, fixed=TRUE)
> break1

[1] 17 17 17 17 17 17

[1] 3 3 3 3 3 3
> # Put Variable name in a variable
> str1 <- substr(test,1,break1-1)
> str1

[1] "A5.Brands.bought" "A5.Brands.bought" "A5.Brands.bought"
[5] "A5.Brands.bought" "A5.Brands.bought"
> # Put Brand name in a variable
> str2 <- substr(test,break1+3, nchar(test))
> str2

[1] "Dulux" "Haymes" "Solver"
[4] "Taubmans.or.Bristol" "Wattyl" "Other"

Thanks for any and all suggestions

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Apr 2011 - 04:10:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 15 Apr 2011 - 02:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive