[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

From: Emmanuel Levy <emmanuel.levy_at_gmail.com>
Date: Tue, 12 Aug 2008 19:35:22 -0400

Dear All,

I have a large data frame ( 2700000 lines and 14 columns), and I would like to extract the information in a particular way illustrated below:

Given a data frame "df":

> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df

   names col1

1      A    1
2      A    0
3      A    1
4      A    0
5      A    1
6      B    0
7      B    0
8      B    1
9      B    0
10     B    0

I would like to tranform it in the form:

> index = c("A","B")
> col1[[1]]=df$col1[which(df$name=="A")]
> col1[[2]]=df$col1[which(df$name=="B")]

My problem is that the command: *** which(df$name=="A") *** takes about 1 second because df is so big.

I was thinking that a "level" could maybe be accessed instantly but I am not sure about how to do it.

I would be very grateful for any advice that would allow me to speed this up.

Best wishes,


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 12 Aug 2008 - 23:44:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 13 Aug 2008 - 03:33:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive