[R] extracting data from a list of unformatted text files

From: ravi <rv15i_at_yahoo.se>
Date: Thu, 20 Nov 2008 10:18:29 +0000 (GMT)


Hi,
I want to extract information from a number of text files in a folder. The files are named as : 82534.txt, 82555.txt, 8282787.txt etc.

I give below a sample of the kind of the information in the text file : ########
#(a lot of preceding text)

2008-10-01      06:30:12                2 of 3 page

#(some lines of text - varies from file to file)
sekvens    890
# lines of text

sNo     start            stop            direction        value
1        70                85                up                60.2
3        60                90                down            71.5
#########

In each of the files that I choose, I want to first go to the appropriate page number. This is the first line in the above text and the page number is 2 (from 2 of 3). The date and time preceding the page number vary from file to file, but the next line always has the word, page. After that, I am interested in the number following the word, sekvens. Also, the table underneath.

Finally, I want to collect all the data in a data frame with the following structure :

fileno    sekvens    sNo    start    stop    direction    value
82534    890            1        70       85    up            60.2
82534    890            3        60        90    down        71.5
82555     ..               ..        ..        ..        ..            ..

There are a number of topics involved here where I have almost no familiarity. First, the use of regular expressions to specify the files that I want from a folder. Next, how do I locate a particular section (or page) in the text file from the description that I am interested in? Should these files be read in their entirety first, or is it possible to directly go the section with the relevant text? Next, how do I extract the data in the form that I want? 

I have identified the following commands that would be useful for me here : list.files(), readLines(), strsplit(). I would appreciate some help in getting started here. I would certainly benefit from a few hints. I would also appreciate it if I could get some links to references with examples showing how similiar problems are tackled. Thanking you,
Ravi



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Nov 2008 - 10:22:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 20 Nov 2008 - 14:00:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive