[R] [Fwd: Re: regular expression]

From: Laurent Rhelp <laurentRhelp_at_free.fr>
Date: Sun 08 Apr 2007 - 20:44:03 GMT

attached mail follows:


Uwe Ligges a écrit :

> I guess your problem has been solved by last night's discussion with
> Gabor G.?
>
> Uwe Ligges
>
>
> Laurent Rhelp wrote:
>
>> Uwe Ligges a écrit :
>>
>>> Laurent Rhelp wrote:
>>>
>>>
>>>> Uwe Ligges a écrit :
>>>>
>>>>
>>>>
>>>>> Laurent Rhelp wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Dear R-List,
>>>>>>
>>>>>> I have a great many files in a directory and I would like to
>>>>>> replace in every file the character " by the character ' and in
>>>>>> the same time, I have to change ' by '' (i.e. the character '
>>>>>> twice and not the unique character ") when the character ' is
>>>>>> embodied in "....."
>>>>>> So, "....." becomes '.....' and ".....'......" becomes
>>>>>> '.....''......'
>>>>>> Certainly, regular expression could help me but I am not able to
>>>>>> use it.
>>>>>>
>>>>>> How can I do that with R ?
>>>>>>
>>>>>
>>>>>
>>>>> In fact, you do not need to know anything about regular
>>>>> expressions in this case, since you are simply going to replace
>>>>> certain characters by others without any fuzzy restrictions:
>>>>>
>>>>> x <- "\".....'......\""
>>>>> cat(x, "\n")
>>>>> xn <- gsub('"', "'", gsub("'", "''", x))
>>>>> cat(xn, "\n")
>>>>>
>>>>>
>>>>> Uwe Ligges
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Thank you very much
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help@stat.math.ethz.ch mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Yes, You are right. So I wrote the code below (that I find a little
>>>> awkward but it works).
>>>>
>>>> ##-----
>>>>
>>>> dirdata <- getwd()
>>>> fichnames <- list.files(path=paste(dirdata,"\\initial\\",sep=""))
>>>>
>>>
>>>
>>> see ?file.path to improve the above.
>>>
>>>
>>>
>>>
>>>> for( i in 1:length(fichnames)){
>>>>
>>>
>>>
>>> see ?seq to improve the above: seq(along = fichnames)
>>> Or even better, just work on the names (see below).
>>>
>>>
>>>
>>>> filein <- paste(dirdata,"\\initial\\",fichnames[i],sep="")
>>>>
>>>
>>>
>>> again, file.path() is your friend
>>>
>>>
>>>
>>>> conin <- file(filein)
>>>> open(conin)
>>>
>>> > nbrows <- length( readLines(conin,n=-1) )
>>>
>>>
>>>> close(conin)
>>>>
>>>
>>>
>>> You can simply use readLines() with the filename which open the
>>> connection to a file itself. And I do not see why you want to read
>>> the file here. Since your code becomes really complicated now, let
>>> me suggest the following procedure (untested!):
>>>
>>> dirdata <- getwd()
>>> fichnames <- list.files(file.path(dirdata, "initial"))
>>> for(i in fichnames){
>>> temp <- readLines(file.path(dirdata, "initial", i))
>>> temp <- gsub('"', "'", gsub("'", "''", temp))
>>> writeLines(temp, con = file.path(dirdata, "result", i))
>>> }
>>>
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> fileout <- paste(dirdata,"\\result\\",fichnames[i],sep="")
>>>> conout <- file(fileout,"w")
>>>>
>>>> conin <- file(filein)
>>>> open(conin)
>>>>
>>>>
>>>> for( l in 1:nbrows )
>>>> {
>>>> text <- gsub('"',"'",gsub("'","''",readLines(conin,n=1)))
>>>> writeLines(con=conout,text=text)
>>>> }
>>>>
>>>> close(conin)
>>>> close(conout)
>>>> }
>>>>
>>>> ##------
>>>>
>>>
>>>
>>> ______________________________________________
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>> Dear Uwe,
>>
>> The code doesn't do what I want because I want to replace ' by ''
>> only when the character ' is embodied in "......"
>> So :
>> 1. " becomes '
>> 2. ".....'......" becomes '......''......'
>> 3. but '.......' has to stay '.......' and not ''......''
>>
>> Did I miss something ?
>>
>>
>>
>>
>>
>
>
>
Yes, Gabor gave me the end of the solution. Thank you Uwe and Gabor. For the people who are interested in the solution I will recapitulate below the differents steps.

The objective was to replace in a lot of files single quotes in double quoted strings with single quoted strings containing double quotes in place of the single quotes. We have to allow for the fact that the pattern can be written on several lines in the files.

Let us list the steps to realize that :

  1. read the lines of every file in a vector of strings (readLines)
  2. transform this vector in a string with multiple lines (paste(,collapse="\n")
  3. use a regular expression on this string to do the replacement
  4. go back to a vector of strings (strsplit) to have again the initial file altered
  5. write the new file

##--                                 

library(gsubfn)

squote <- "'" # single quote.
              # This is a double quote, single quote, double quote
dquote <- '"' # double quote
              #This is a single quote, double quote, single quote

f <- function(x) chartr(paste(squote, dquote), paste(dquote, squote), x)

dirdata <- getwd() # not necessary
fichnames <- list.files(file.path(dirdata, "\\initial"),pattern=".PRC$") # to select only the files with .PRC extension for example for(i in fichnames){

   Lines <- readLines(file.path(dirdata, "\\initial", i))    temp <- gsubfn('["][^"]*["]', f, paste(Lines, collapse = "\n"))    Lines <- unlist(strsplit(temp,"\n")) # strsplit returns a list not a character vector

   writeLines(Lines, con = file.path(dirdata, "\\result", i)) }

Thank you very much again.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon Apr 09 06:48:27 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 08 Apr 2007 - 21:31:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.