# Re: [R] counting the occurrences of vectors

From: Gabor Grothendieck <ggrothendieck_at_myway.com>
Date: Tue 06 Jul 2004 - 14:22:22 EST

Marc Schwartz <MSchwartz <at> MedAnalytics.com> writes:

> the likely overhead involved in paste()ing together the rows
> to create objects

I thought I would check this and it seems that in my original f1 function its not really the paste itself that's the bottleneck but applying the paste. If we use do.call rather than apply, as shown in f1a below, then we see that f1a runs faster than row.match.count (which in turn was faster than f1):

f1a <- function(a,b,sep=":") {

```	f <- function(...) paste(..., sep=sep)
a2 <- do.call("f", as.data.frame(a))
b2 <- do.call("f", as.data.frame(b))
c(table(c(b2,unique(a2)))[a2] - 1)
```

}

> set.seed(1)
> # note that we have increased the size of the matrices from last post
> # to better show the speed difference
> a <- matrix(sample(3,10000,rep=T),nc=5)
> b <- matrix(sample(3,1000,rep=T),nc=5)

> # row.match.count taken from Marc's post in this thread
> # have put a c(...) around row.match.count to make it comparable to f1a
> gc(); system.time(ans <- c(row.match.count(b,a)))

```         used (Mb) gc trigger (Mb)
Ncells 436079 11.7     741108 19.8
Vcells 130663  1.0     786432  6.0
```

 0.11 0.00 0.11 NA NA

> gc(); system.time(ansf1a <- f1a(b,a))

```         used (Mb) gc trigger (Mb)
Ncells 436080 11.7     741108 19.8
Vcells 130669  1.0     786432  6.0
```

 0.04 0.00 0.04 NA NA

> all.equal(ansf1a,ans)

 TRUE
>

R-help@stat.math.ethz.ch mailing list