Re: [Rd] binary string conversion to a vector (PR#14120)

From: Franc Brglez <brglez_at_ncsu.edu>
Date: Thu, 10 Dec 2009 17:11:52 -0500


Hello!  

Please accept my sincere apologies for annoying the R development team with my post this week. If I were required to register as "a developer" before submission, this would not have happened. To rehabilitate myself, please find at the bottom of this mail two R-functions, 'string2vector' and 'vector2string', with "comments and tests". Both functions may go a long way towards assisting a number of R-users to make their R-programming more productive. I am a novice R-programmer: I started dabbling in R less than two months ago, heavily influenced by examples of code I see, including within the R.org documents (monkey does what monkey sees). Before posting two functions, I would really appreciate constructive edits where they may be needed as well as their posting by someone-in-the-know so there will be conveniently accessible for R users.

I am very impressed with potential of R and the community supporting it. I just wish I got to R sooner: I am looking to R to better support my work in "designed experiments to assess the statistically significant performance of combinatorial optimization algorithms on instance isomorphs of NP-hard problems" -- for better context of this mouthful, see the few postings under   http://www.cbl.ncsu.edu:16080/xBed/publications/ I am working on a tutorial paper where I expect R to play a significant role in better explaining and illustrating, code-wise and graphically, the concepts discussed in the publications above. I would welcome a co-author with experience in R-programming as well as statistics and interests in the experimental methods addressed in these publications.

As I elaborate in notes that follow, I was looking at a variety of "R-documents" before my "bug" submission. I would appreciate very much if some of you could take the time to scan through these notes and respond briefly with useful pointers. Here are the headlines:

    (1) why I still think there may be a bug with 'noquote' vs 'as.integer'

    (2) search on "split string" and "join string"; the missing package "stringr"

    (3) a take on "Tcl" commands 'split', 'join', 'string', 'append', 'foreach'

    (4) a take on "R" functions 'string2vector' and 'vector2string'

    (5) code and comments for "R" functions 'string2vector' and 'vector2string

(1) why I still think there may be a bug with 'noquote' vs 'as.integer'



> # MacOSX 10.6.2, R 2.9.1 GUI 1.28 Tiger build 32-bit (5444)
> qvector

[1] "0" "0" "0" "1" "1" "0" "1"
> qvector[1]

[1] "0"
> tmp = noquote(qvector[1])
> tmp
[1] 0
> tmp = as.integer(qvector[1])
> tmp

[1] 0
>

When embedded in the function as per my "bug" report, 'noquote' and 'as.integer' are no longer equivalent whereas in the example above they appear to be equivalent!! I submitted the "function" with print/cat statements for sake of illustration.

(2) search on "split string" and "join string"; the missing package "stringr"



http://search.r-project.org/ reveals

   orderof 850 messages for search on "split string"    orderof 160 messages for search on "join string"

http://finzi.psych.upenn.edu/search.html reveals

    for search on "split string"

   	• Rhelp08:   [ split: 890 ] [ string: 1676 ] [ TOTAL: 77 ]
        • functions: [ split: 954 ] [ string: 6453 ] [ TOTAL: 204 ]
    for search on "join string"
	• Rhelp08:   [ join: 176 ] [ string: 1676 ] [ TOTAL: 8 ]
	• functions: [ join: 192 ] [ string: 6453 ] [ TOTAL: 36 ]
    This site also provides a link to the package "stringr"     http://finzi.psych.upenn.edu/R/library/stringr/html/00Index.html However, the download does not deliver ...
> install.packages("stringr")

  ....
   package ‘stringr’ is not available

There are a lot of hard-to-understand and not-so-relevant code snippets in all these 1000's of postings. I would argue that had robust functions such as 'string2vector' and 'vector2string' been included in the R-package, many R-programmers could take longer vacations, spend their time more productively, and significantly reduce duplication of coding efforts on basically the same problems.

Since vector is such and important "primitive" in R, I argue that functions such as 'string2vector' and 'vector2string' should be made to play a role similar to commands 'split', 'join', 'string', and 'append' that support programmers in Tcl. See my take on Tcl in the section below.

(3) a take on "Tcl" commands 'split', 'join', 'string', 'append', 'foreach'



I have been using Tcl to "wrap" a number of combinatorial solvers and automate workflows that implement and execute a number of my experiments on instance isomorphs. I even used Tcl to prototype few combinatorial optimization algorithm prototypes and write code for statistical analysis -- as task for which I now find R much better suited.

I intend to alert my Tcl colleagues in-the-know about the wonderful infrastructure provided in R when it comes to the R-shell (at least under MacOSX), and the ability to name and initialize function variable defaults explicitly, and the ability to install new packages so transparently. Before coming across R, I already took the trouble to create Tcl wrapper programs with command lines that feature identical order-indepent syntax as the syntax used in R. This being said, what I miss about R is gathering all commands on a single page such as

   http://www.tcl.tk/man/tcl8.5/TclCmd/contents.htm Note that once you click on any of the commands, a number of classes that extend each command become visible, including the example section(s).

Here I illustrate my use of just five tcl commands that subsequently guided my "design" of the function 'string2vector' in 'vector2string' "R"

# few "Tcl" examples before designing the function 'string2vector' in "R" % set binS "10011"
% join [split $binS ""] ", "
1, 0, 0, 1, 1

%
% set strS "I \t am\tdone" 
% foreach item [split $strS "\t"] {append strSQ \"$item\",}
% set strSQ [string trimright $strSQ ,]

"I "," am","done"
#
# few "Tcl" examples before designing the function 'vector2string' in "R" % set strV "1,0,0,1"
1,0,0,1
% split $strV ","
1 0 0 1
join [split $strV ","] ":"
1:0:0:1

(4) a take on "R" functions 'string2vector' and 'vector2string'



> # few tests of the function 'string2vector' in "R"
> binS = "10011"
> binV = string2vector(binS, SS="", type="int")
> binV[2] ; binV[5]

[1] 0
[1] 1
> strS = "I am done"
> vecS = string2vector(strS, SS=" ", type="char")
> vecS[1] ; vecS[3]

[1] "I"
[1] "done"
>
> # few tests of the function 'vector2string' in "R"
> binV = c(1,0,0,1)
> vector2string(binV, type="int")

[1] "1001"
> vector2string(binV, SS=" ", type="char")
[1] "1 0 0 1"
> subsV = c("I", "am", "done")
> vector2string(subsV, SS=":", type="char")
[1] "I:am:done"
>

(5) code and comments for "R" functions 'string2vector' and 'vector2string'


string2vector = function(string="ch-2 \t sec-7\tex-5", SS="\t", type="char")

#
# This procedure splits a string and assigns substrings to an R-vector.
# The split is controlled by the string separator SS (default value:  SS="\t").
# Here we convert  a binary string into a binary vector:
#   let  binS = "10011"  
#   then binV = string2vector(binS, SS="", type="int")
# Here we convert a string into a vector of substrings:
#   let  strS = "I am done" 
#   then vecS = string2vector(strS, SS=" ", type="char")
#
# LIMITATION: The function interprets all substrings either as of type 
#             "int" or "char".  A function that interprets the type of each
#             substring dynamically may one day be written by an R-guru.
#              
# Franc Brglez, Wed Dec  9 14:19:16 EST 2009
{

    qlist = strsplit(string, SS) ; qvector = qlist[[1]]     n = length(qvector) ; xvector = NULL     for (i in 1:n) {

        if (type == "int") {
            tmp = as.integer(qvector[i])
        } else {
            tmp = qvector[i]
        }
	xvector = c(xvector, tmp)

    }
    return(xvector)
} # string2vector

vector2string = function(vector=c("ch-2", "sec-7", "ex-5"), SS="_", type="char")

#
# This procedure converts values from a vector to a concatenation of substrings 
# separated by user-specified string separator SS (default value:  SS="_").
# Each substring represents a vector component value, either as a numerical 
# value or as an alphanumeric string. 
# Here we convert a binary vector to a binary string representing an integer:
#   let  binV = c(1,0,0,1)  
#   then strS = vector2string(binV, type="int")
# Here we convert a binary vector to string representing a binary sequence:
#   let  binV = c(1,0,0,1)  
#   then seqS = vector2string(binV, SS=" ", type="char")
# Here we convert a vector of substrings to colon-separated string:
#   let subsV = c("I", "am", "done")  
#   then strS = vector2string(subsV, SS=":", type="char")
#
# LIMITATION: The function interprets all substrings in the vector either as of 
#             type "int" or "char".  A function that interprets the type of each
#             substring dynamically may one day be written by an R-guru.
#
# Franc Brglez, Wed Dec  9 15:43:59 EST 2009
{

    if (type == "int") {

        string = paste(strsplit(paste(vector), " "), collapse="")     } else {

        n = length(vector) ; nm1 = n-1 ; string = ""
        for (i in 1:nm1) {
            tmp    = noquote(vector[i])
            string = paste(string, tmp, SS, sep="")
        }
        tmp    = noquote(vector[n])
        string = paste(string, tmp, sep="")     
    }
    return(string)
} # vector2string
Dr. Franc Brglez                                        email: brglez_at_ncsu.edu 
Department of Computer Science, Box 8206     http://sitta.csc.ncsu.edu/~brglez
North Carolina State University                            TEL: (919) 515-9675
Raleigh NC 27695-8206 USA

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 11 Dec 2009 - 14:11:29 GMT

This archive was generated by hypermail 2.2.0 : Sat 12 Dec 2009 - 19:11:08 GMT