Re: [R] Reading fixed column format

From: Duncan Murdoch <>
Date: Wed 13 Sep 2006 - 11:06:36 GMT

Anupam Tyagi wrote:
> Barry Rowlingson <B.Rowlingson <at>> writes:
>>> None of these seem to read non-coniguous variables from columns; or
>>> may be I am missing something. "read.fwf" is not meant for large
>>> files according to a post in the archives. Thanks for the pointers. I
>>> have read the R data input and output. Anupam.
>> First up, how 'large' is your 'large ASCII file'? How many rows and
>> columns?
> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.
>> Secondly, what are 'non-contiguous' variables?
> When I do not want to read all columns. For example, I would like to read the
> following:
> StartingColumn VariableName FieldLength
> 1 STATE 2
> 24 INTVID 3
> 30 PSU 10

read.fwf() can handle the skipped columns (you use negative column values; see the man page). It will break the read up into blocks, so the large size of the original file shouldn't be a problem.

Duncan Murdoch

> Sometimes I would also like to format the data after it has been read. For
> example, the ASCII file has price in columns 100 to 105 written as 005999. I
> want to read this and format it as 59.99 (omitting leading zeros in the price).
>> Perhaps if you posted the first few lines and columns of the file then
>> we might get an idea of how to read it in.
> I have not even downloaded the data onto my computer yet, because I am not sure
> I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
> recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
> took some doing to get it in, and there were things that did not seem to work as
> I wanted them to---I could not figure out how to label the variables. I usually
> do not have to work with a dataframe of more than 10-30 MB at a time.
> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.
> I agree with with Michael Kobovy that this is a very helpful list, and people do
> not owe less than what one paid for the software :)
> Anupam.
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed Sep 13 22:15:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 13 Sep 2006 - 12:30:05 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.