[R] increasing speed for permutations of glm

From: Juliet Hannah <juliet.hannah_at_gmail.com>
Date: Fri, 25 Jan 2008 02:21:03 -0500


Dear R Programmers,
I am trying to run a Poisson regression on all pairs of variables in a data set and
obtain the permutation distribution. The number of pairs is around 100000. It seems my code will take weeks to run, unless I try something else. Could you give me any suggestions on how to improve the speed of the code below, or any general suggestions on how I may accomplish this task. Thanks for your time,
Juliet

To run this code, first enter the model matrices (matrices given at bottom):

# X_red <- as.matrix(read.table("clipboard",header=F),nrow=18,byrow=T)
# X_full <- as.matrix(read.table("clipboard",header=F),nrow=18,byrow=T)

library(combinat)
# make some example data; the actual data is 700x800
myData <- matrix(sample(c(1:3),500,replace=TRUE),nrow=100,ncol=5)
# the response is binary

response <- c(rep(1,50),rep(0,50))
# initalize permutation of response 'labels'.
perm.response <- response

counts <- rep(1,18)

# Number of permutations

nperm <- 5

# matrix of all pairs of indices

all.pairs <- combn2(1:ncol(myData))
# initalize results

pmatrix <- matrix(-1,nrow=nperm,ncol=nrow(all.pairs))

getLRTpval <- function(index)
{

  # A contingency table is formed from two columns of the data and the response (3 way table) and made into a vector   counts <- as.vector(table(myData[,index[1]],myData[,index[2]], perm.response));

  # Add 1 to any count that = 0.
  counts[counts == 0] <- 1
  reduced_model <- glm.fit(X_red,counts,family=poisson(link="log"))   full_model <- glm.fit(X_full,counts,family=poisson(link="log"))   pval <- pchisq(reduced_model$deviance - full_model$deviance, reduced_model$df.residual - full_model$df.residual, lower.tail= FALSE) }

for (perm in 1:nperm)
{
  # Permute the labels
  perm.response <- sample(response,100,replace=TRUE)   pmatrix[perm,] <- apply(all.pairs, 1, getLRTpval) }

#X_red
1 1 0 1 0 1 1 0 0 0
1 1 0 0 1 1 0 1 0 0
1 1 0 -1 -1 1 -1 -1 0 0
1 0 1 1 0 1 0 0 1 0
1 0 1 0 1 1 0 0 0 1
1 0 1 -1 -1 1 0 0 -1 -1

1 -1 -1 1 0 1 -1 0 -1 0
1 -1 -1 0 1 1 0 -1 0 -1
1 -1 -1 -1 -1 1 1 1 1 1

1 1 0 1 0 -1 1 0 0 0
1 1 0 0 1 -1 0 1 0 0
1 1 0 -1 -1 -1 -1 -1 0 0
1 0 1 1 0 -1 0 0 1 0
1 0 1 0 1 -1 0 0 0 1
1 0 1 -1 -1 -1 0 0 -1 -1
1 -1 -1 1 0 -1 -1 0 -1 0
1 -1 -1 0 1 -1 0 -1 0 -1

1 -1 -1 -1 -1 -1 1 1 1 1

# X_full

1 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 -1 -1 1 -1 -1 0 0 1 0 -1 -1 1 0 1 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 1 -1 -1 1 0 0 -1 -1 0 1 -1 -1

1 -1 -1  1  0  1 -1  0 -1  0 -1 -1  1  0
1 -1 -1  0  1  1  0 -1  0 -1 -1 -1  0  1
1 -1 -1 -1 -1  1  1  1  1  1 -1 -1 -1 -1
1  1  0  1  0 -1  1  0  0  0 -1  0 -1  0
1  1  0  0  1 -1  0  1  0  0 -1  0  0 -1
1  1  0 -1 -1 -1 -1 -1  0  0 -1  0  1  1
1  0  1  1  0 -1  0  0  1  0  0 -1 -1  0
1  0  1  0  1 -1  0  0  0  1  0 -1  0 -1
1  0  1 -1 -1 -1  0  0 -1 -1  0 -1  1  1
1 -1 -1  1  0 -1 -1  0 -1  0  1  1 -1  0
1 -1 -1 0 1 -1 0 -1 0 -1 1 1 0 -1 1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 Jan 2008 - 07:22:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 Jan 2008 - 11:30:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive