From: Huntsinger, Reid <reid_huntsinger_at_merck.com>

Date: Wed 06 Apr 2005 - 05:39:23 EST

}

}

[1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058 [8] 0.9989292 1.1163793 2.1153085

>

Date: Wed 06 Apr 2005 - 05:39:23 EST

To get the neighborhoods of radius r of each point in your data set, given
distances calculated already in the matrix d, you could do (but note below)

$ A <- (d <= r)

then rows (or columns) of A are indicator vectors for the neighborhoods. "Unique" will work on these vectors, as "unique.array", to give the unique rows, which would be the unique neighborhood lists:

$ unique(A)

Your question about why "unique" applied to a distance matrix ignores zeros points to a possible problem: the object you get from dist() is not a matrix. The "upper" and "diag" options only control printing. If you check length() you'll see you only have n(n-1)/2 elements, the lower triangle of the distance matrix. (To answer the question: unique() sees only these; there's not a method for objects of class dist.) So you need to do

$ d <- as.matrix(distmat)

to get a matrix.

Reid Huntsinger

-----Original Message-----

From: r-help-bounces@stat.math.ethz.ch

[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Paul Johnson

Sent: Tuesday, April 05, 2005 1:36 PM

To: r-help@stat.math.ethz.ch

Subject: [R] lists: removing elements, iterating over elements,

http://www.cc.gatech.edu/~tucker/index2.html

While I work on this, I realize (again) that I'm a C programmer masquerading in R, and its really tricky working with R lists. Here are things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius.

(Again, I apologize this looks so much like C.)

## Returns a list of all U-diametric clusters of a given radius ## Give an R distance matrix ## Clusters may overlap. Clusters may be identical (redundant)getUDClusters <-function(distmat,radius){

mem <- list()

nItems <- dim(distmat)[1]

for ( i in 1:nItems ){

mem[[i]] <- c(i)

}

for ( m in 1:nItems ){

for ( n in 1:nItems ){ if (m != n & (distmat[m,n] <= radius)){ ##item is within radius, so add to collection m mem[[m]] <- sort(c( mem[[m]],n)) } }

}

return(mem)

}

That generates the list, like this:

*[[1]]
*

[1] 1 3 4 5 6 7 8 9 10

*[[2]]
*

[1] 2 3 4 10

*[[3]]
*

[1] 1 2 3 4 5 6 7 8 10

*[[4]]
*

[1] 1 2 3 4 10

*[[5]]
*

[1] 1 3 5 6 7 8 9 10

*[[6]]
*

[1] 1 3 5 6 7 8 9 10

*[[7]]
*

[1] 1 3 5 6 7 8 9 10

*[[8]]
*

[1] 1 3 5 6 7 8 9 10

*[[9]]
*

[1] 1 5 6 7 8 9 10

[[10]]

[1] 1 2 3 4 5 6 7 8 9 10

The next task is to eliminate the redundant elements. unique() does not apply to lists, so I have to scan one by one.

cluslist <- getUDClusters(distmat,radius)

##find redundant (same) clusters

redundantCluster <- c()

for (m in 1:(length(cluslist)-1)) {

for ( n in (m+1): length(cluslist) ){ if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){ if ( sum(cluslist[[m]] == cluslist[[n]]){ redundantCluster <- c( redundantCluster,n) } } }

}

##make sure they are sorted in reverse order if (length(redundantCluster)>0)

{ redundantCluster <- unique(sort(redundantCluster, decreasing=T))

## remove redundant clusters (must do in reverse order to preserve index of cluslist)

for (i in redundantCluster) cluslist[[i]] <- NULL }

Question: am I deleting the list elements properly?

I do not find explicit documentation for R on how to remove elements from lists, but trial and error tells me

myList[[5]] <- NULL

will remove the 5th element and then "close up" the hole caused by deletion of that element. That suffles the index values, So I have to be careful in dropping elements. I must work from the back of the list to the front.

Is there an easier or faster way to remove the redundant clusters?

Now, the next question. After eliminating the redundant sets from the list, I need to calculate the total number of items present in the whole list, figure how many are in each subset--each list item--and do some calculations.

I expected this would iterate over the members of the list--one step for each subcollection

for (i in cluslist){

}

but it does not. It iterates over the items within the subsets of the list "cluslist." I mean, if cluslist has 5 sets, each with 10 elements, this for loop takes 50 steps, one for each individual item.

I find this does what I want

for (i in 1:length(cluslist))

But I found out the hard way :)

Oh, one more quirk that fooled me. Why does unique() applied to a distance matrix throw away the 0's???? I think that's really bad!

> x <- rnorm(5) > myDist <- dist(x,diag=T,upper=T) > myDist 1 2 3 4 5 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058 3 1.6658710 0.3728735 0.0000000 0.9989292 1.11637934 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000 > unique(myDist)

[1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058 [8] 0.9989292 1.1163793 2.1153085

>

-- Paul E. Johnson email: pauljohn@ku.edu Dept. of Political Science http://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Wed Apr 06 05:45:56 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:02 EST
*