Re: [R] counting the occurrences of vectors

From: Spencer Graves <spencer.graves_at_pdf.com>
Date: Mon 05 Jul 2004 - 10:28:35 EST

      I see a case where "f1" gives the wrong answer:

      b <- array(c("a:b", "a", "c", "b:c"), dim=c(2,2))
      a <- b[c(1,1),]

      For these two matrices, f1(a,b) == c(2,2), while f2(a,b) == 
c(2,0). If b does not contain ":", e.g., if it is numeric, then this pathology can not occur. However, if "f1" is used with objects of class character or string that could contain the "collapse" character, it could give an incorrect answer without warning.

      hope this helps. spencer graves

Ravi Varadhan wrote:

>Thanks to Gabor, Marc, and Spencer for their elegant solutions. Gabor's first solution worked the best for me.
>
>Best,
>Ravi.
>
>________________________________
>
>From: r-help-bounces@stat.math.ethz.ch on behalf of Gabor Grothendieck
>Sent: Sat 7/3/2004 12:12 PM
>To: r-help@stat.math.ethz.ch
>Subject: Re: [R] counting the occurrences of vectors
>
>
>
>Ravi Varadhan <rvaradha <at> jhsph.edu> writes:
>
>
>
>>Hi:
>>
>>I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m
>>
>>
>>>k. Is there a computationally fast way to
>>>
>>>
>>count the number of times each row (a k-vector) of B occurs in A? Thanks
>>
>>
>for any suggestions.
>
>
>>Best,
>>Ravi.
>>
>>
>
>Here are two approaches. The first one is an order of magnitude faster
>than the second.
>
>R> # test matrices
>R> set.seed(1)
>R> a <- matrix(sample(3,1000,rep=T),nc=5)
>R> b <- matrix(sample(3,100,rep=T),nc=5)
>
>R> f1 <- function(a,b) {
>+ a2 <- apply(a, 1, paste, collapse=":")
>+ b2 <- apply(b, 1, paste, collapse=":")
>+ c(table(c(a2,unique(b2)))[b2] - 1)
>+ }
>
>R> f2 <- function(a,b) {
>+ ta <- t(a)
>+ apply(b,1,function(x)sum(apply(ta == x,2,all)))
>+ }
>
>R> gc(); system.time(ans1 <- f1(a,b))
> used (Mb) gc trigger (Mb)
>Ncells 458311 12.3 818163 21.9
>Vcells 124264 1.0 786432 6.0
>[1] 0.03 0.00 0.03 NA NA
>
>R> gc(); system.time(ans2 <- f2(a,b))
> used (Mb) gc trigger (Mb)
>Ncells 458312 12.3 818163 21.9
>Vcells 124270 1.0 786432 6.0
>[1] 0.1 0.0 0.1 NA NA
>
>R> all.equal(ans1, ans2)
>[1] TRUE
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 05 10:32:59 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:29:07 EST