From: Dennis Murphy <djmuser_at_gmail.com>

Date: Thu, 10 Jun 2010 04:02:42 -0700

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 Jun 2010 - 11:06:29 GMT

Date: Thu, 10 Jun 2010 04:02:42 -0700

Hi:

I had Harold's idea (matrix indexing), but I was curious to see which of
these ran fastest. I simulated

1000 rows and three columns of binary data, along with a fourth column that
sampled the values 1:3

1000 times. Here are the timings:

*> f <- as.data.frame(matrix(rbinom(3000, 1, 0.4), nrow = 1000))
**> names(f) <- LETTERS[1:3]
**> f$D <- sample(1:3, 1000, replace = TRUE)
**> system.time(E1 <- f[cbind(1:nrow(f), f$D)])
*

user system elapsed

0 0 0

*> system.time(E2 <- apply(f, 1, function(x) x[eval(x)["D"]]))
*

user system elapsed

0.03 0.00 0.03

*> system.time(E3 <- diag(as.matrix(f[f$D])))
*

user system elapsed

0.26 0.03 0.30

*> identical(E1, E2)
*

**[1] TRUE
**

*> identical(E2, E3)
*

**[1] TRUE
**
**HTH,
**

Dennis

On Wed, Jun 9, 2010 at 7:03 AM, Malcolm Fairbrother < m.fairbrother_at_bristol.ac.uk> wrote:

> Dear all,

*>
**> I have a data frame f, with four variables:
**>
**> f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3))
**> f
**> A B C D
**> 1 0 0 1 3
**> 2 0 1 1 1
**> 3 1 0 0 2
**> 4 1 1 1 3
**>
**> I want to create a new variable (f$E), such that each of its elements is
**> drawn from either f$A, f$B, or f$C, according to the value (for each row) of
**> f$D (values of which range from 1 to 3).
**>
**> In the first row, D is 3, so I want the value from the third variable (C),
**> which for the first row is 1. In the second row, D is 1, so I want the value
**> from the first variable (A), which for the second row is 0. And so forth,
**> such that in the end my new data frame looks like:
**>
**> A B C D E
**> 1 0 0 1 3 1
**> 2 0 1 1 1 0
**> 3 1 0 0 2 0
**> 4 1 1 1 3 1
**>
**> My question is: How do I do this for a much larger dataset, where my "index
**> variable" (f$D in this example) actually indexes a much larger number of
**> variables (not just three)?
**>
**> I know that in principle I could do this with a long series of nested
**> ifelse statements (as below), but I assume there is some less cumbersome
**> option, and I'd like to know what it is. Any help would be much appreciated.
**> Apologies if I'm missing something obvious.
**>
**> f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A))
**>
**> Thanks,
**> Malcolm
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 Jun 2010 - 11:06:29 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 10 Jun 2010 - 11:10:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*