Re: [R] Seeking a more efficient way to read in a file

From: Charilaos Skiadas <>
Date: Wed, 2 Jan 2008 20:42:21 -0500

On Jan 2, 2008, at 6:05 PM, Talbot Katz wrote:

> Hi.
> I have a matrix stored in a large, tab-delimited flat file. The
> first row contains column names. Because the matrix is symmetric,
> the file has lower triangular format, so the second row contains
> one number, the third row two numbers, etc. In general, row k+1
> contains k numbers; the matrix has 3000 rows, so the file has 3001
> rows. The file has variable length records, so each row ends with
> its last piece of data. I read in the file and produced the full
> symmetric matrix as follows:
>> mana01 <- scan( file = "C:/mat.dat", sep = "\t", nlines = 1, what
>> = "character" )Read 3000 items> nco <- length( mana01 )> malt <-
>> matrix(0, nrow = nco, ncol = nco )> colnames( malt ) <- mana01>
>> rownames( malt ) <- mana01> for ( i in 1:3000 ) { malt[ i, (1:i) ]
>> <- scan( file="C:/mat.dat", skip = i, n = i, quiet = TRUE ) }
>> mat <- malt + t( malt ) - diag( diag( malt ) )>
> The for loop took a couple of hours to complete. I suspect there's
> a much faster way to do this. Any suggestions? Thanks!

I saw Jim's reply just after having just written a solution, so here is my take on it. The key thing, as Jim mentioned, is to not use scan each time, but to read the whole thing in and then process it. I read the lines, used strsplit to get a list of each individual line, and then used sapply after extending each row by the right number of zeros.

Not sure which of the two is faster.

nms <- scan("~/Desktop/testing.txt", sep="\t", nlines=1, what=character(0))
x <- scan("~/Desktop/testing.txt", sep="\n", skip=1, what=character (0)) # read as a vector of lines
splt <- strsplit(x,"\t") # split at the tabs nr <- length(nms)
splt <- sapply(splt, function(x) c(as.numeric(x), rep(0,nr-length (x)))) # extend each for by the right number of zeros.

Haris Skiadas
Department of Mathematics and Computer Science Hanover College mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 03 Jan 2008 - 01:44:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jan 2008 - 03:30:05 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive