Re: [R] Need to fit a regression line using orthogonal residuals

From: Henrik Bengtsson <hb_at_stat.berkeley.edu>
Date: Wed 31 Jan 2007 - 02:18:56 GMT

The iwpca() in Bioconductor package aroma.light takes a matrix of column vectors and "fits an R-dimensional hyperplane using iterative re-weighted PCA". From ?iwpca.matrix:

"Arguments:
X N-times-K matrix where N is the number of observations and K is the number of dimensions.

Details:
This method uses weighted principal component analysis (WPCA) to fit a R-dimensional hyperplane through the data with initial internal weights all equal. At each iteration the internal weights are recalculated based on the "residuals". If method=="L1", the internal weights are 1 / sum(abs(r) + reps). This is the same as method=function(r) 1/sum(abs(r)+reps). The "residuals" are orthogonal Euclidean distance of the principal components R,R+1,...,K. In each iteration before doing WPCA, the internal weighted are multiplied by the weights given by argument w, if specified."

Thus, in your case you want to do:

X <- cbind(x,y)
fit <- iwpca(X)

There are different ways of robustifying the estimate, cf. argument 'method'. For heteroscedastic noise, fitting in L1 is convenient since there is no bandwidth parameter.

Hope this helps

Henrik

On 1/30/07, Bill.Venables@csiro.au <Bill.Venables@csiro.au> wrote:
> Jonathon Kopecky asks:
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Jonathon Kopecky
> Sent: Tuesday, 30 January 2007 5:52 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Need to fit a regression line using orthogonal residuals
>
> I'm trying to fit a simple linear regression of just Y ~ X, but both X
> and Y are noisy. Thus instead of fitting a standard linear model
> minimizing vertical residuals, I would like to minimize
> orthogonal/perpendicular residuals. I have tried searching the
> R-packages, but have not found anything that seems suitable. I'm not
> sure what these types of residuals are typically called (they seem to
> have many different names), so that may be my trouble. I do not want to
> use Principal Components Analysis (as was answered to a previous
> questioner a few years ago), I just want to minimize the combined noise
> of my two variables. Is there a way for me to do this in R?

> [WNV] There's always a way if you are prepared to program it. Your
> question is a bit like asking "Is there a way to do this in Fortran?"
> The most direct way to do it is to define a function that gives you the
> sum of the perpendicular distances and minimise it using, say, optim().
> E.g.
> ppdis <- function(b, x, y) sum((y - b[1] - b[2]*x)^2/(1+b[2]^2))
> b0 <- lsfit(x, y)$coef # initial value
> op <- optim(b0, ppdis, method = "BFGS", x=x, y=y)
> op # now to check the results
> plot(x, y, asp = 1) # why 'asp = 1'?? exercise
> abline(b0, col = "red")
> abline(op$par, col = "blue")
> There are a couple of things about this you should be aware of, though
> First, this is just a fiddly way of finding the first principal
> component, so your desire not to use Principal Component Analysis is
> somewhat thwarted, as it must be.
> Second, the result is sensitive to scale - if you change the scales of
> either x or y, e.g. from lbs to kilograms, the answer is different.
> This also means that unless your measurement units for x and y are
> comparable it's hard to see how the result can make much sense. A
> related issue is that you have to take some care when plotting the
> result or orthogonal distances will not appear to be orthogonal.
> Third, the resulting line is not optimal for either predicting y for a
> new x or x from a new y. It's hard to see why it is ever of much
> interest.

> Bill Venables.
>
>
> Jonathon Kopecky
> University of Michigan
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 31 13:23:47 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 31 Jan 2007 - 03:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.