[R] Formatted Data File Question for Clustering -Quickie Project

From: <ngottlieb_at_marinercapital.com>
Date: Wed, 13 Jun 2007 11:46:56 -0400


I am trying to learn how to format Ascii data files for scan or read into R.

Precisely for a quickie project, I found some code (at end of this email) to do exactly what I need:
To cluster and graph a dendrogram from package (stats).

I am stuck on how to format a text file to run the script. I looked at the dataset USArrests (which would be replaced by my data and labels) using UltraEdit. That data appears to be in binary format and I would simply like a readable ASCII text file.

How can I:
A) format this data to a file for the script below? B) I would like to use squared Euclidean distance, can hclust support this?

Thanks,
Neil Gottlieb

Here is sub-set example of my data set, return series to cluster: 13 cases by 36 observations):
Month Convertible Arbitrage Dedicated Short Bias Emerging Markets

1/31/1994	0.004	-0.016	0.105	
2/28/1994	0.002	0.020	-0.011	
3/31/1994	-0.010	0.072	-0.046	
4/30/1994	-0.025	0.013	-0.084	
5/31/1994	-0.010	0.023	-0.007	
6/30/1994	0.002	0.064	0.005	
7/31/1994	0.001	-0.012	0.058	
8/31/1994	0.000	-0.057	0.164	
9/30/1994	-0.012	0.016	0.052	
10/31/1994	-0.014	-0.004	-0.035	
11/30/1994	-0.002	0.030	-0.014	
12/31/1994	-0.019	-0.002	-0.042	
1/31/1995	-0.006	0.013	-0.100	
2/28/1995	0.012	-0.022	-0.079	
3/31/1995	0.013	0.004	-0.055	
4/30/1995	0.023	-0.004	0.073	
5/31/1995	0.017	-0.013	0.013	
6/30/1995	0.019	-0.069	0.008	
7/31/1995	0.009	-0.059	0.022	
8/31/1995	0.008	0.008	0.010	
9/30/1995	0.011	-0.029	0.019	
10/31/1995	0.013	0.064	-0.057	
11/30/1995	0.023	-0.010	-0.031	
12/31/1995	0.014	0.049	0.007	
1/31/1996	0.021	0.006	0.079	
2/29/1996	0.012	-0.056	-0.006	
3/31/1996	0.015	-0.009	-0.009	
4/30/1996	0.013	-0.066	0.051	
5/31/1996	0.016	0.000	0.045	
6/30/1996	0.015	0.051	0.054	
7/31/1996	0.014	0.098	-0.027	
8/31/1996	0.013	-0.034	0.036	
9/30/1996	0.011	-0.059	0.016	
10/31/1996	0.014	0.043	0.017	
11/30/1996	0.014	-0.029	0.026	

Code Example from Help files:

hc <- hclust(dist(USArrests), "ave")
(dend1 <- as.dendrogram(hc)) # "print()" method
str(dend1)          # "str()" method

str(dend1, max = 2) # only the first two sub-levels

op <- par(mfrow= c(2,2), mar = c(5,2,1,4)) plot(dend1)
## "triangle" type and show inner nodes:
plot(dend1, nodePar=list(pch = c(1,NA), cex=0.8, lab.cex = 0.8),

      type = "t", center=TRUE)
plot(dend1, edgePar=list(col = 1:2, lty = 2:3), dLeaf=1, edge.root = TRUE)
plot(dend1, nodePar=list(pch = 2:1,cex=.4*2:1, col = 2:3), horiz=TRUE)

dend2 <- cut(dend1, h=70)
plot(dend2$upper)
## leafs are wrong horizontally:

plot(dend2$upper, nodePar=list(pch = c(1,7), col = 2:1))
## dend2$lower is *NOT* a dendrogram, but a list of .. :
plot(dend2$lower[[3]], nodePar=list(col=4), horiz = TRUE, type = "tr")
## "inner" and "leaf" edges in different type & color :
plot(dend2$lower[[2]], nodePar=list(col=1),# non empty list

     edgePar = list(lty=1:2, col=2:1), edge.root=TRUE) par(op)
str(d3 <- dend2$lower[[2]][[2]][[1]])

nP <- list(col=3:2, cex=c(2.0, 0.75), pch= 21:22, bg= c("light blue", "pink"),

           lab.cex = 0.75, lab.col = "tomato") plot(d3, nodePar= nP, edgePar = list(col="gray", lwd=2), horiz = TRUE) addE <- function(n) {

      if(!is.leaf(n)) {
        attr(n, "edgePar") <- list(p.col="plum")
        attr(n, "edgetext") <- paste(attr(n,"members"),"members")
      }
      n

}
d3e <- dendrapply(d3, addE)
plot(d3e, nodePar= nP)
plot(d3e, nodePar= nP, leaflab = "textlike")

This information is being sent at the recipient's request or...{{dropped}}



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Jun 2007 - 16:43:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 14 Jun 2007 - 08:31:57 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.