From: Zhuanshi He <zhuanshi.he_at_gmail.com>

Date: Wed 28 Jun 2006 - 03:03:30 EST

Date: Wed 28 Jun 2006 - 03:03:30 EST

Maybe this link is useful

http://www.bic.mni.mcgill.ca/users/jason/cortex/stats-manuals/mni.read.glim.file.html

Also, section 2.3 through http://cran.r-project.org/doc/manuals/R-data.html

2.3 Using scan directly

Both read.table and read.fwf use scan to read the file, and then process the results of scan. They are very convenient, but sometimes it is better to use scan directly.

Function scan has many arguments, most of which we have already covered under read.table. The most crucial argument is what, which specifies a list of modes of variables to be read from the file. If the list is named, the names are used for the components of the returned list. Modes can be numeric, character or complex, and are usually specified by an example, e.g. 0, "" or 0i. For example

cat("2 3 5 7", "11 13 17 19", file="ex.dat", sep="\n") scan(file="ex.dat", what=list(x=0, y="", z=0), flush=TRUE)

returns a list with three components and discards the fourth column in the file.

There is a function readLines which will be more convenient if all you want is to read whole lines into R for further processing.

One common use of scan is to read in a large matrix. Suppose file matrix.dat just contains the numbers for a 200 x 2000 matrix. Then we can use

A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE)

On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine) whereas

A <- as.matrix(read.table("matrix.dat"))

took 10 seconds (and more memory), and

A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,

comment.char = "", colClasses = "numeric"))

took 7 seconds. The difference is almost entirely due to the overhead of reading 2000 separate short columns: were they of length 2000, scan took 9 seconds whereas read.table took 18 if used efficiently (in particular, specifying colClasses) and 125 if used naively.

Note that timings can depend on the type read and the data. Consider reading a million distinct integers:

writeLines(as.character((1+1e6):2e6), "ints.dat") xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s xf <- as.factor(xc) # 2.2s DF <- read.table("ints.dat") # 4.5s

and a million examples of a small set of codes:

code <- c("LMH", "SJC", "CHCH", "SPC", "SOM") writeLines(sample(code, 1e6, replace=TRUE), "code.dat")

y <- scan("code.dat", what=character(0), n=1e6) # 0.44s yf <- as.factor(y) # 0.21s DF <- read.table("code.dat") # 4.9s DF <- read.table("code.dat", nrows=1e6) # 3.6s

Note that these timings depend heavily on the operating system (the basic reads in Windows take at least as twice as long as these Linux times) and on the precise state of the garbage collector.

Hope this works.

Z. He

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
On 6/28/06, Cuau <cuauv@yahoo.com> wrote:

*>
**>
*

> Hello everyone,

*>
**> I'm writting a little script that will read a matrix from a file
**>
**> i.e.
**>
**> 0,.11,.22,.4
**> .11,0,.5,.3
**> .22,.5,0,.7
**> anb so on
**>
**> and will then calculate some standard stats for nets (i.e. centralization, degree, etc).
**>
**> So far I have opened the file and read the contents, however I' m using readLines(filename)
**> to read the file and it returns it as one big String with no divitions. I tried using
**> strsplit(String)
**> to split it but eventhough is working I'm not able to put the output of the above into a matrix.
**>
**> Below is an example of what I have done
**>
**>
**> > INfile<-file("mTest.txt", "r")
**> > readLines(INfile)->matrix
**> > matrix
**> [1] "1, 2, 3"
**> > strsplit(matrix, ",")->splitLine
**> > splitLine
**> [[1]]
**> [1] "1" " 2" " 3"
**>
**> > netMatrix <-matrix(c(splitLine), nrow=1,ncol=3)
**> > netMatrix
**> [,1] [,2] [,3]
**> [1,] Character,3 Character,3 Character,3
**>
**>
**> Does anyone have an idea how can I read a matrix and store it in the form of a matrix.
**>
**> thks
**>
**> -Cuau Vital
**>
**>
**>
**> ---------------------------------
**>
**> [[alternative HTML version deleted]]
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
**>
**>
*

-- Zhuanshi He / Z. He (PhD) ADvanced Environmental Monitoring Research Center (ADEMRC) Gwangju Institute of Science and Technology 1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea. Tel. +82-62-970-3406 Fax. +82-62-970-3404 Email: Zhuanshi.He@gmail.com Zhuanshi_He@msn.com Zhuanshi_He@yahoo.com.cn Web: http://atm1.gist.ac.kr/~hzs/ BBS: http://atm1.gist.ac.kr/~hzs/phpBB2/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Wed Jun 28 03:06:38 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 28 Jun 2006 - 04:13:10 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*