Re: [R] Loading matrices and other things

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed 01 Jun 2005 - 11:59:26 EST

On 5/31/05, Mike Schuler <schulerm@bc.edu> wrote:
> Hi all,
>
> I'm new to R, so needless to say I have a couple questions (which I hope
> I haven't missed through the documentation).
> I have several files in lower triangular matrix form. For each of these
> matrices, I want to perform some form of hierarchical clustering on each
> matrix and capture the output of the clustering.
>
> The first problem I run into is actually loading the matrix file into R.
> I've attempted using the read.table function but to no avail. What is
> the best way to read in a matrix?
> Note: matrices are in a form like so, a space between each value,
> then a newline There is also a diagonal of 0's stripped out. (Matrices
> are the output of RNAdistance if that's helpful)
> Let's say its stored in a file called 'rtest'
> 21
> 34 55
> 55 34 21
> 27 10 61 44
> 59 42 25 8 40
> 61 44 27 10 34 6
> 73 64 57 48 66 44 50
> 78 69 62 53 71 49 55 5
> 77 68 103 94 70 94 96 88 89
> 77 68 103 94 70 90 96 84 85 10
> 31 24 53 46 30 50 52 72 73 74 74
>
> Second, I've searched through the web and it seems hclust
> <http://www.maths.lth.se/help/R/.R/library/mva/html/hclust.html> is the
> appropriate function From what I can tell from here
> <http://stat.ethz.ch/R-manual/R-devel/library/stats/html/dist.html> the
> above matrix should be a valid format (even without the 0s), but
> confirmation would be nice. And with hclust, does this produce a tree
> with the output, or would that be the plclust function? I haven't been
> able to experiment with this because of my inability to do accomplish
> the previous question.

Here is something to try:

# get number of entries and read in
n <- max(count.fields("myfile.dat")) + 1 x <- scan("myfile.dat")

# create matrix from x

x.mat <- matrix(0,n,n)
x.mat[upper.tri(x.mat)] <- x
x.mat <- x.mat + t(x.mat)

# convert to distance matrix
x.dist <- as.dist(x.mat)

# run hclust
x.hclust <- hclust(x.dist)

# plot
plot(x.hclust, cex = 0.6)
rect.hclust(x.hclust,k=5,border="red")

> And last, I want to be able to run R on many different files of the same
> matrix type. Is it possible to write a (Python) script run through the
> appropriate tasks and save the visual output as a postscript file?

You don't need another language. It can all be done from R. Suppose we want to read in each .dat file in the current directory, plot it and save the plot:

for (f in dir(patt = "[.]dat$")) { x <- read.table(f); plot(x); savePlot(f, "ps") }

savePlot, used above, is specific to Windows. See ?dev.print if you are not on Windows.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 01 12:04:07 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:17 EST