Re: [R] Reading fixed column format

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Wed 13 Sep 2006 - 11:06:36 GMT

Anupam Tyagi wrote:
> Barry Rowlingson <B.Rowlingson <at> lancaster.ac.uk> writes:
>
>
>>> None of these seem to read non-coniguous variables from columns; or
>>> may be I am missing something. "read.fwf" is not meant for large
>>> files according to a post in the archives. Thanks for the pointers. I
>>> have read the R data input and output. Anupam.
>>>
>> First up, how 'large' is your 'large ASCII file'? How many rows and
>> columns?
>>
>
> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.
>
>
>> Secondly, what are 'non-contiguous' variables?
>>
>
> When I do not want to read all columns. For example, I would like to read the
> following:
>
> StartingColumn VariableName FieldLength
> 1 STATE 2
> 24 INTVID 3
> 27 DISPCODE 3
> 30 PSU 10
>

read.fwf() can handle the skipped columns (you use negative column values; see the man page). It will break the read up into blocks, so the large size of the original file shouldn't be a problem.

Duncan Murdoch

> Sometimes I would also like to format the data after it has been read. For
> example, the ASCII file has price in columns 100 to 105 written as 005999. I
> want to read this and format it as 59.99 (omitting leading zeros in the price).
>
>
>> Perhaps if you posted the first few lines and columns of the file then
>> we might get an idea of how to read it in.
>>
>
> I have not even downloaded the data onto my computer yet, because I am not sure
> I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
> recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
> took some doing to get it in, and there were things that did not seem to work as
> I wanted them to---I could not figure out how to label the variables. I usually
> do not have to work with a dataframe of more than 10-30 MB at a time.
>
> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.
>
> I agree with with Michael Kobovy that this is a very helpful list, and people do
> not owe less than what one paid for the software :)
>
> Anupam.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Sep 13 22:15:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 13 Sep 2006 - 12:30:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.