Re: [R] Pulling strings from a Flat file

From: David Winsemius <>
Date: Tue, 05 Apr 2011 22:59:47 -0400

On Apr 5, 2011, at 7:48 PM, Kalicin, Sarah wrote:

> Hi,
> I have a flat file that contains a bunch of strings that look like
> this. The file was originally in Unix and brought over into Windows:
> E123456E234567E345678E456789E567891E678910E. . . .
> Basically the string starts with E and is followed with 6 numbers.
> One string=E123456, length=7 characters. This file contains 10,000's
> of these strings. I want to separate them into one vector the length
> of the number of strings in the flat file, where each string is it's
> on unique value.
> cc<-c(7,7,7,7,7,7,7)
>> aa<- file("Master","r", raw=TRUE)
>> readChar(aa, cc, useBytes = FALSE)

> [1] "E123456" "\nE23456" "7\nE3456" "78\nE456" "789\nE56"
> "7891\nE6" "78910\nE"
>> close(aa)
>> unlink("Master")

 > txt <- "E123456E234567E345678E456789E567891E678910E" # You could use readLines to bring in from the file # and assign to a character vector for work in R.

 > gsub("(E[[:digit:]]{6})", "\\1\n", txt) [1] "E123456\nE234567\nE345678\nE456789\nE567891\nE678910\nE" # Seems to be "working" properly

 > ?scan

 > scan(textConnection(gsub("(E[[:digit:]]{6})", "\\1\n", txt)), what="character")
Read 7 items
[1] "E123456" "E234567" "E345678" "E456789" "E567891" "E678910" "E"

You might be able to use read.table or variants.

> The biggest issue is I am getting \n added into the string, which I
> am not sure where it is coming from, and splices the strings. Any
> suggestions on getting rid of the /n and create an infinite sequence
> of 7's for the string length for the cc vector? Is there a better
> way to do this?
> Sarah

David Winsemius, MD
West Hartford, CT mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 03:02:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Apr 2011 - 06:40:27 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive