[R] qr with missing dependent variables

From: Richard Mott <Richard.Mott_at_well.ox.ac.uk>
Date: Fri 09 Dec 2005 - 03:52:29 EST

Dear R-help

We have a regression problem which could be solved elegantly if we could figure out how to get the R residuals() function to accept missing dependent variables.

We have ~20000 gene-expression vectors y, each being measured on the same set of individuals, but each having a small random number of missing values.

For each expression vector we wish to search across the genome looking for quantitative trait loci - ie chromosomal regions g where the local genetic structure, represented by the design matrix X(g), gives a significant linear regression relationship. Depending on the complexity of the genetic model being investigated, X(g) typically has either 7 or 32 columns, i.e is of non-trivial size. the number of loci g to be investigated is ~13000, so we have to do 13000*20000 = 260,000,000 multiple regressions. Therefore computational efficiency is important.

We thought of one way to do this: - for each design matrix g, compute the qr decomposition once, then work out the residual sum of squares for each of the expression phenotypes using residuals() on the qr object applied to the expression vector. That way would only need to do the hard part of the linear regression once.

The problem with this approach is the missing values, which are not allowed by residuals(). Unfortunatley we can't just eliminate all rows containing a missing value because we would throw away too much data.

Is there a way round this ? Can we set the missing values to 0 and then sort out the discrepancies in the residual SS? More generally, is it consistent to compute a qr decomposition including rows for which there are no dependent observations ?

As far as I can see, this problem has not been addressed in R-help, but my apologies if it has !


Richard Mott

Richard Mott       | Wellcome Trust Centre
tel 01865 287588   | for Human Genetics
fax 01865 287697   | Roosevelt Drive, Oxford OX3 7BN

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Dec 09 05:23:03 2005

This archive was generated by hypermail 2.1.8 : Fri 09 Dec 2005 - 09:31:24 EST