Re: [R] increasing speed for permutations of glm

From: jim holtman <jholtman_at_gmail.com>
Date: Fri, 25 Jan 2008 06:07:51 -0500

Use Rprof on your code to see where time is being spend and then focus on those areas for improvement.

On Jan 25, 2008 2:21 AM, Juliet Hannah <juliet.hannah_at_gmail.com> wrote:
> Dear R Programmers,
> I am trying to run a Poisson regression on all pairs of variables in a data
> set and
> obtain the permutation distribution. The number of pairs is around 100000.
> It seems my code will take weeks to run, unless I try something else.
> Could you give me any suggestions on how to improve the speed of the
> code below, or any general suggestions on how I may accomplish this task.
> Thanks for your time,
> Juliet
>
> To run this code, first enter the model matrices (matrices given at bottom):
>
> # X_red <- as.matrix(read.table("clipboard",header=F),nrow=18,byrow=T)
> # X_full <- as.matrix(read.table("clipboard",header=F),nrow=18,byrow=T)
>
>
> library(combinat)
> # make some example data; the actual data is 700x800
> myData <- matrix(sample(c(1:3),500,replace=TRUE),nrow=100,ncol=5)
> # the response is binary
> response <- c(rep(1,50),rep(0,50))
> # initalize permutation of response 'labels'.
> perm.response <- response
>
> counts <- rep(1,18)
>
> # Number of permutations
> nperm <- 5
>
> # matrix of all pairs of indices
> all.pairs <- combn2(1:ncol(myData))
> # initalize results
> pmatrix <- matrix(-1,nrow=nperm,ncol=nrow(all.pairs))
>
> getLRTpval <- function(index)
> {
>
> # A contingency table is formed from two columns of the data and the
> response (3 way table) and made into a vector
> counts <- as.vector(table(myData[,index[1]],myData[,index[2]],
> perm.response));
>
> # Add 1 to any count that = 0.
> counts[counts == 0] <- 1
> reduced_model <- glm.fit(X_red,counts,family=poisson(link="log"))
> full_model <- glm.fit(X_full,counts,family=poisson(link="log"))
> pval <- pchisq(reduced_model$deviance - full_model$deviance,
> reduced_model$df.residual - full_model$df.residual, lower.tail= FALSE)
> }
>
>
> for (perm in 1:nperm)
> {
> # Permute the labels
> perm.response <- sample(response,100,replace=TRUE)
> pmatrix[perm,] <- apply(all.pairs, 1, getLRTpval)
> }
>
> #X_red
> 1 1 0 1 0 1 1 0 0 0
> 1 1 0 0 1 1 0 1 0 0
> 1 1 0 -1 -1 1 -1 -1 0 0
> 1 0 1 1 0 1 0 0 1 0
> 1 0 1 0 1 1 0 0 0 1
> 1 0 1 -1 -1 1 0 0 -1 -1
> 1 -1 -1 1 0 1 -1 0 -1 0
> 1 -1 -1 0 1 1 0 -1 0 -1
> 1 -1 -1 -1 -1 1 1 1 1 1
> 1 1 0 1 0 -1 1 0 0 0
> 1 1 0 0 1 -1 0 1 0 0
> 1 1 0 -1 -1 -1 -1 -1 0 0
> 1 0 1 1 0 -1 0 0 1 0
> 1 0 1 0 1 -1 0 0 0 1
> 1 0 1 -1 -1 -1 0 0 -1 -1
> 1 -1 -1 1 0 -1 -1 0 -1 0
> 1 -1 -1 0 1 -1 0 -1 0 -1
> 1 -1 -1 -1 -1 -1 1 1 1 1
>
>
> # X_full
>
> 1 1 0 1 0 1 1 0 0 0 1 0 1 0
> 1 1 0 0 1 1 0 1 0 0 1 0 0 1
> 1 1 0 -1 -1 1 -1 -1 0 0 1 0 -1 -1
> 1 0 1 1 0 1 0 0 1 0 0 1 1 0
> 1 0 1 0 1 1 0 0 0 1 0 1 0 1
> 1 0 1 -1 -1 1 0 0 -1 -1 0 1 -1 -1
> 1 -1 -1 1 0 1 -1 0 -1 0 -1 -1 1 0
> 1 -1 -1 0 1 1 0 -1 0 -1 -1 -1 0 1
> 1 -1 -1 -1 -1 1 1 1 1 1 -1 -1 -1 -1
> 1 1 0 1 0 -1 1 0 0 0 -1 0 -1 0
> 1 1 0 0 1 -1 0 1 0 0 -1 0 0 -1
> 1 1 0 -1 -1 -1 -1 -1 0 0 -1 0 1 1
> 1 0 1 1 0 -1 0 0 1 0 0 -1 -1 0
> 1 0 1 0 1 -1 0 0 0 1 0 -1 0 -1
> 1 0 1 -1 -1 -1 0 0 -1 -1 0 -1 1 1
> 1 -1 -1 1 0 -1 -1 0 -1 0 1 1 -1 0
> 1 -1 -1 0 1 -1 0 -1 0 -1 1 1 0 -1
> 1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 25 Jan 2008 - 11:18:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 Jan 2008 - 12:30:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive