[Rd] Very slow subsetting by name

From: Hervé Pagès <hpages_at_fhcrc.org>
Date: Thu, 15 Jul 2010 01:12:10 -0700


Hi,

I'm subsetting a named vector using character indices. My vector of indices (or keys) is 10x longer than the vector I'm subsetting. All my keys are distinct and only 10% of them are valid (i.e. match a name of the vector being subsetted). It is surprisingly slow:

x1 <- 1:1000
names(x1) <- paste("a", x1, sep="")
keys <- sample(c(names(x1), paste("b", 1:9000, sep="")))
> system.time(y1 <- x1[keys])

    user system elapsed
  &nbsp;0.410 0.000 0.416

x2 <- 1:2000
names(x2) <- paste("a", x2, sep="")
keys <- sample(c(names(x2), paste("b", 1:18000, sep="")))
> system.time(y2 <- x2[keys])

    user system elapsed
  &nbsp;1.730 0.000 1.736

ink4">x3 <- 1:4000
names(x3) <- paste("a", x3, sep="")
keys <- sample(c(names(x3), paste("b", 1:36000, sep="")))
> system.time(y3 <- x3[keys])

    user system elapsed
  &nbsp;8.900 0.010 9.227

x4 <- 1:8000
names(x4) <- paste("a", x4, sep="")
keys <- sample(c(names(x4), paste("b", 1:72000, sep="")))
> system.time(y4 <- x4[keys])

    user system elapsed
130.390 0.000 132.316

And it's apparently worse than quadratic in time!

I'm wondering why this subsetting by name is so slow since it seems it could be implemented with x4[match(keys, names(x4))], which is very fast: only 0.012s!

This is with R-2.11.0 and R-2.12.0.

Thanks,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages_at_fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 15 Jul 2010 - 08:17:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 15 Jul 2010 - 18:30:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive