From: Doran, Harold <HDoran_at_air.org>

Date: Thu, 03 Jun 2010 13:21:11 -0400

R-help_at_r-project.org<mailto:R-help_at_r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Date: Thu, 03 Jun 2010 13:21:11 -0400

Actually, this is clever. I modified your advice and vectorized this as:

1/ (1 + exp(b_vector - t(matrix(theta, nrow= nrow(dat) , ncol= ncol(dat)))))

Instead of using the apply() function as I did before. In terms of speed, this new solution is immensely faster, as you also noted. Now, whether it works on my real world problem is TBD. It is running now, slowly.

From: Joris Meys [mailto:jorismeys_at_gmail.com]
Sent: Wednesday, June 02, 2010 7:35 PM

To: Doran, Harold

Cc: r-help_at_r-project.org

Subject: Re: [R] Use apply only on non-missing values

Not really a direct answer on your question, but:

> system.time(replicate(10000,apply(as.matrix(theta), 1, rasch, b_vector)))

user system elapsed

4.51 0.03 4.55

> system.time(replicate(10000,theta%*%t(b_vector)))

user system elapsed

0.25 0.00 0.25

It does make a difference on large datasets...
Cheers

Joris

On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold <HDoran_at_air.org<mailto:HDoran_at_air.org>> wrote:
I have a function that I am currently using very inefficiently. The following are needed to illustrate the problem:

set.seed(12345)

dat <- matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10)
mis <- sample(1:110, 5)

dat[mis] <- NA

theta <- rnorm(11)

b_vector <- runif(10, -4,4)

empty <- which(is.na<http://is.na>(t(dat)))

So, I have a matrix (dat) with some values within the matrix missing. In my real world problem, the matrix is huge, and most values are missing. The function in question is called derivs() and is below. But, let me step through the inefficient portions.

First, I create a matrix of some predicted probabilities as:

rasch <- function(theta,b) 1/ (1 + exp(b-theta)) mat <- apply(as.matrix(theta), 1, rasch, b_vector)

However, I only need those predicted probabilities in places where the data are not missing. So, the next step in the function is

mat[empty] <- NA

which manually places NAs in places where the data are missing (notice the matrix 'mat' is the transpose of the data matrix and so I get the empty positions from the transpose of dat).

Afterwards, the function computes the gradient and hessians needed to complete the MLE estimation.

All of this works in the sense that it yields the correct answers for my problem. But, the glaring problem is that I create predicted probabilities for every cell in 'mat' when in many cases they are not needed. I end up replacing those values with NAs. In my real world problem, this is horribly inefficient and slow.

My question is then is there a way to use apply such that is computes the necessary predicted probabilities only when the data are not missing to yield the matrix 'mat'. My desired end result is the matrix 'mat' created after the manually placing the NAs in the appropriate cells.

Thanks

Harold

derivs <- function(dat, b_vector, theta){

mat <- apply(as.matrix(theta), 1, rasch, b_vector) mat[empty] <- NA gradient <- -(colSums(dat, na.rm = TRUE) - rowSums(mat, na.rm = TRUE)) hessian <- -(rowSums(mat * (1-mat), na.rm = TRUE)) list('gradient' = gradient, 'hessian' = hessian) }

*> sessionInfo()
*

R version 2.10.1 (2009-12-14)

i386-pc-mingw32

locale:

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252

[4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] tools_2.10.1

*>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org<mailto:R-help_at_r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

-- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 Joris.Meys_at_Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu 03 Jun 2010 - 17:42:18 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 18:20:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*