From: Charilaos Skiadas <cskiadas_at_gmail.com>

Date: Wed, 2 Jan 2008 20:42:21 -0500

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 03 Jan 2008 - 01:44:38 GMT

Date: Wed, 2 Jan 2008 20:42:21 -0500

On Jan 2, 2008, at 6:05 PM, Talbot Katz wrote:

*> Hi.
**>
*

> I have a matrix stored in a large, tab-delimited flat file. The

*> first row contains column names. Because the matrix is symmetric,
**> the file has lower triangular format, so the second row contains
**> one number, the third row two numbers, etc. In general, row k+1
**> contains k numbers; the matrix has 3000 rows, so the file has 3001
**> rows. The file has variable length records, so each row ends with
**> its last piece of data. I read in the file and produced the full
**> symmetric matrix as follows:
**>
**>> mana01 <- scan( file = "C:/mat.dat", sep = "\t", nlines = 1, what
**>> = "character" )Read 3000 items> nco <- length( mana01 )> malt <-
**>> matrix(0, nrow = nco, ncol = nco )> colnames( malt ) <- mana01>
**>> rownames( malt ) <- mana01> for ( i in 1:3000 ) { malt[ i, (1:i) ]
**>> <- scan( file="C:/mat.dat", skip = i, n = i, quiet = TRUE ) }
**>> mat <- malt + t( malt ) - diag( diag( malt ) )>
**>
**> The for loop took a couple of hours to complete. I suspect there's
**> a much faster way to do this. Any suggestions? Thanks!
*

I saw Jim's reply just after having just written a solution, so here is my take on it. The key thing, as Jim mentioned, is to not use scan each time, but to read the whole thing in and then process it. I read the lines, used strsplit to get a list of each individual line, and then used sapply after extending each row by the right number of zeros.

Not sure which of the two is faster.

nms <- scan("~/Desktop/testing.txt", sep="\t", nlines=1,
what=character(0))

x <- scan("~/Desktop/testing.txt", sep="\n", skip=1, what=character
(0)) # read as a vector of lines

splt <- strsplit(x,"\t") # split at the tabs
nr <- length(nms)

splt <- sapply(splt, function(x) c(as.numeric(x), rep(0,nr-length
(x)))) # extend each for by the right number of zeros.

Haris Skiadas

Department of Mathematics and Computer Science
Hanover College

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 03 Jan 2008 - 01:44:38 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 03 Jan 2008 - 03:30:05 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*