RE: [R] lists: removing elements, iterating over elements,

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Wed 06 Apr 2005 - 05:22:16 EST

> From: Paul Johnson
>
> I'm writing R code to calculate Hierarchical Social Entropy,
> a diversity
> index that Tucker Balch proposed. One article on this was
> published in
> Autonomous Robots in 2000. You can find that and others
> through his web
> page at Georgia Tech.
>
> http://www.cc.gatech.edu/~tucker/index2.html
>
> While I work on this, I realize (again) that I'm a C programmer
> masquerading in R, and its really tricky working with R
> lists. Here are
>
> I need to calculate overlapping U-diametric clusters of a
> (Again, I apologize this looks so much like C.)
>
>
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap. Clusters may be identical (redundant)
> mem <- list()
>
> nItems <- dim(distmat)[1]
> for ( i in 1:nItems ){
> mem[[i]] <- c(i)
> }

This loop can be replaced with mem <- as.list(1:nItems)...

> for ( m in 1:nItems ){
> for ( n in 1:nItems ){
> if (m != n & (distmat[m,n] <= radius)){
> mem[[m]] <- sort(c( mem[[m]],n))
> }
> }
> }

If I understood the code correctly, this should do the same:

neighbors <- which(distmat <= radius, arr.ind=TRUE)     neighbors <- neighbors[neighbors[, 1] != neighbors[, 2],]     mem <- split(neighbors[, 2], neighbors[, 1])

What I'm not sure of is whether you intend to include the i-th item in the i-th list (since the distance is presumably 0). Your code seems to indicate no, as you have m != n in the if() condition. The second line above removes such results. However, your list below seems to indicate that you do have such elements in your lists. If such results can not be in the list, then the list should already be unique, no?

For deleting an element of a list, see R FAQ 7.1.

HTH,
Andy

> return(mem)
> }
>
>
> That generates the list, like this:
>
> [[1]]
> [1] 1 3 4 5 6 7 8 9 10
>
> [[2]]
> [1] 2 3 4 10
>
> [[3]]
> [1] 1 2 3 4 5 6 7 8 10
>
> [[4]]
> [1] 1 2 3 4 10
>
> [[5]]
> [1] 1 3 5 6 7 8 9 10
>
> [[6]]
> [1] 1 3 5 6 7 8 9 10
>
> [[7]]
> [1] 1 3 5 6 7 8 9 10
>
> [[8]]
> [1] 1 3 5 6 7 8 9 10
>
> [[9]]
> [1] 1 5 6 7 8 9 10
>
> [[10]]
> [1] 1 2 3 4 5 6 7 8 9 10
>
>
> The next task is to eliminate the redundant elements.
> unique() does not
> apply to lists, so I have to scan one by one.
>
>
>
> ##find redundant (same) clusters
> redundantCluster <- c()
> for (m in 1:(length(cluslist)-1)) {
> for ( n in (m+1): length(cluslist) ){
> if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
> if ( sum(cluslist[[m]] == cluslist[[n]]){
> redundantCluster <- c( redundantCluster,n)
> }
> }
> }
> }
>
>
> ##make sure they are sorted in reverse order
> if (length(redundantCluster)>0)
> {
> redundantCluster <- unique(sort(redundantCluster,
> decreasing=T))
>
> ## remove redundant clusters (must do in reverse order to preserve
> index of cluslist)
> for (i in redundantCluster) cluslist[[i]] <- NULL
> }
>
>
> Question: am I deleting the list elements properly?
>
> I do not find explicit documentation for R on how to remove elements
> from lists, but trial and error tells me
>
> myList[[5]] <- NULL
>
> will remove the 5th element and then "close up" the hole caused by
> deletion of that element. That suffles the index values, So
> I have to
> be careful in dropping elements. I must work from the back of
> the list
> to the front.
>
>
> Is there an easier or faster way to remove the redundant clusters?
>
>
> Now, the next question. After eliminating the redundant sets
> from the
> list, I need to calculate the total number of items present
> in the whole
> list, figure how many are in each subset--each list item--and do some
> calculations.
>
> I expected this would iterate over the members of the
> list--one step for
> each subcollection
>
> for (i in cluslist){
>
> }
>
> but it does not. It iterates over the items within the
> subsets of the
> list "cluslist." I mean, if cluslist has 5 sets, each with
> 10 elements,
> this for loop takes 50 steps, one for each individual item.
>
> I find this does what I want
>
> for (i in 1:length(cluslist))
>
> But I found out the hard way :)
>
>
> Oh, one more quirk that fooled me. Why does unique() applied to a
> distance matrix throw away the 0's???? I think that's really bad!
>
> > x <- rnorm(5)
> > myDist <- dist(x,diag=T,upper=T)
> > myDist
> 1 2 3 4 5
> 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
> 3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
> 4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
> > unique(myDist)
> [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735
> 1.3718027 0.7435058
> [8] 0.9989292 1.1163793 2.1153085
> >
>
> --
> Paul E. Johnson email: pauljohn@ku.edu
> Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
> 1541 Lilac Lane, Rm 504
> University of Kansas Office: (785) 864-9086
> Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help