From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Wed 06 Apr 2005 - 06:17:35 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 06:22:14 2005

Date: Wed 06 Apr 2005 - 06:17:35 EST

On Apr 5, 2005 1:36 PM, Paul Johnson <pauljohn@ku.edu> wrote:

> I'm writing R code to calculate Hierarchical Social Entropy, a diversity

*> index that Tucker Balch proposed. One article on this was published in
**> Autonomous Robots in 2000. You can find that and others through his web
**> page at Georgia Tech.
**>
**> http://www.cc.gatech.edu/~tucker/index2.html
**>
**> While I work on this, I realize (again) that I'm a C programmer
**> masquerading in R, and its really tricky working with R lists. Here are
**> things that surprise me, I wonder what your experience/advice is.
**>
**> I need to calculate overlapping U-diametric clusters of a given radius.
**> (Again, I apologize this looks so much like C.)
**>
**> ## Returns a list of all U-diametric clusters of a given radius
**> ## Give an R distance matrix
**> ## Clusters may overlap. Clusters may be identical (redundant)
**> getUDClusters <-function(distmat,radius){
**> mem <- list()
**>
**> nItems <- dim(distmat)[1]
**> for ( i in 1:nItems ){
**> mem[[i]] <- c(i)
**> }
**>
**> for ( m in 1:nItems ){
**> for ( n in 1:nItems ){
**> if (m != n & (distmat[m,n] <= radius)){
**> ##item is within radius, so add to collection m
**> mem[[m]] <- sort(c( mem[[m]],n))
**> }
**> }
**> }
**>
**> return(mem)
**> }
**>
**> That generates the list, like this:
**>
**> [[1]]
**> [1] 1 3 4 5 6 7 8 9 10
**>
**> [[2]]
**> [1] 2 3 4 10
**>
**> [[3]]
**> [1] 1 2 3 4 5 6 7 8 10
**>
**> [[4]]
**> [1] 1 2 3 4 10
**>
**> [[5]]
**> [1] 1 3 5 6 7 8 9 10
**>
**> [[6]]
**> [1] 1 3 5 6 7 8 9 10
**>
**> [[7]]
**> [1] 1 3 5 6 7 8 9 10
**>
**> [[8]]
**> [1] 1 3 5 6 7 8 9 10
**>
**> [[9]]
**> [1] 1 5 6 7 8 9 10
**>
**> [[10]]
**> [1] 1 2 3 4 5 6 7 8 9 10
**>
**> The next task is to eliminate the redundant elements. unique() does not
**> apply to lists, so I have to scan one by one.
**>
**> cluslist <- getUDClusters(distmat,radius)
**>
**> ##find redundant (same) clusters
**> redundantCluster <- c()
**> for (m in 1:(length(cluslist)-1)) {
**> for ( n in (m+1): length(cluslist) ){
**> if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
**> if ( sum(cluslist[[m]] == cluslist[[n]]){
**> redundantCluster <- c( redundantCluster,n)
**> }
**> }
**> }
**> }
**>
**> ##make sure they are sorted in reverse order
**> if (length(redundantCluster)>0)
**> {
**> redundantCluster <- unique(sort(redundantCluster, decreasing=T))
**>
**> ## remove redundant clusters (must do in reverse order to preserve
**> index of cluslist)
**> for (i in redundantCluster) cluslist[[i]] <- NULL
**> }
**>
**> Question: am I deleting the list elements properly?
**>
**> I do not find explicit documentation for R on how to remove elements
**> from lists, but trial and error tells me
**>
**> myList[[5]] <- NULL
**>
**> will remove the 5th element and then "close up" the hole caused by
**> deletion of that element. That suffles the index values, So I have to
**> be careful in dropping elements. I must work from the back of the list
**> to the front.
**>
**> Is there an easier or faster way to remove the redundant clusters?
**>
**> Now, the next question. After eliminating the redundant sets from the
**> list, I need to calculate the total number of items present in the whole
**> list, figure how many are in each subset--each list item--and do some
**> calculations.
**>
**> I expected this would iterate over the members of the list--one step for
**> each subcollection
**>
**> for (i in cluslist){
**>
**> }
**>
**> but it does not. It iterates over the items within the subsets of the
**> list "cluslist." I mean, if cluslist has 5 sets, each with 10 elements,
**> this for loop takes 50 steps, one for each individual item.
**>
**> I find this does what I want
**>
**> for (i in 1:length(cluslist))
**>
**> But I found out the hard way :)
**>
**> Oh, one more quirk that fooled me. Why does unique() applied to a
**> distance matrix throw away the 0's???? I think that's really bad!
**>
**> > x <- rnorm(5)
**> > myDist <- dist(x,diag=T,upper=T)
**> > myDist
**> 1 2 3 4 5
**> 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
**> 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
**> 3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
**> 4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
**> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
**> > unique(myDist)
**> [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
**> [8] 0.9989292 1.1163793 2.1153085
**> >
**>
**> --
*

If L is our list of vectors then the following gets the unique elements of L.

I have assumed that the individual vectors are sorted (sort them first if not via lapply(L, sort)) and that each element has a unique name (give it one if not, e.g. names(L) <- seq(L)).

The first line binds them together into rows. This will recycle to make them the same length and give you a warning but that's ok since you only need to know if they are the same or not. Now, unique applied to a matrix finds the unique rows and in the third line we use the row.names from that to get the original unsorted lists.

mat <- unique(do.call("rbind", L))

L[row.names(mat)]

Regarding why the diagonal elements of a distance matrix are not part of the result of applying unique to that distance matrix note that there is no unique.dist method defined in R so you are getting the default which does not know about distance matrices. Now distance matrices don't store their diagonal so its just giving the unique stored elements. Even if unique did have a dist method, unique applied to a matrix gives unique rows, not unique elements, so I am not so sure that it should really do what you want here anyways. I think its clearer just to convert it explicitly to a matrix and then a vector so that the action of unique is understood:

unique(c(as.matrix(myDist)))

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 06:22:14 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:02 EST
*