# Re: [R] Total least squares linear regression

From: Terry Therneau <therneau_at_mayo.edu>
Date: Thu, 11 Feb 2010 08:05:04 -0600

This is also called "Deming regression" and perhaps many other things. It is a lively topic in the validation of competing assay methods in the laboratory. I have a function 'deming.R' that does a generalized form of this, based on the (very nice) article below, the code is attached. (The attachments will be stripped by R-help, but the original requestor will get them.) If someone thinks this to be of general enough interest to package up on CRAN I'm happy donate the code to them -- I won't have the time for some while.

Terry T.

BD Ripley and M Thompson, Regression techniques for the detection of analytical bias, Analyst 112:377-383, 1987.

• begin included message --- After a thorough research, I still find myself unable to find a function that does linear regression of 2 vectors of data using the "total least squares", also called "orthogonal regression" (see : http://en.wikipedia.org/wiki/Total_least_squares) instead of the "ordinary least squares" method. Indeed, the "lm" function has a "method" argument but the manual says that there is only one option so far. However, since the samples I am studying have the type of relationship that requires orthogonal regression, I am bound to use it.

x se.x y se.y
871 192 735 207
701 156 792 223
328 76 340 096
560 126 544 153
155 39 207 59
175 43 229 65
73 22 66 19
366 84 343 97
90 25 125 36
939 207 658 185
439 100 331 93
369 84 272 77
34 13 232 66
194 47 150 43
207 50 350 99
138 36 117 33
181 45 231 66
127 33 188 54
82 23 44 13
188 46 137 40
566 127 704 198
0 06 0 1
0 6 49 15
40 15 129 37
0 6 37 12
198 48 216 62
1021 224 1253 351
464 105 390 110
566 127 466 131
1925 418 1586 445

\name{deming}
\alias{deming}
\title{Fit a generalized Deming regression}
\description{

Find the MLE line relating x and y when both are measured with error. When the variances are constant and equal, this is the special case of Deming regression.
For laboratory analytes this is rarely true, however. }
\usage{

ripleyfit(x, y, xstd, ystd, jackknife = TRUE, dfbeta = FALSE, scale=T) }
\arguments{

  \item{x}{A numeric vector}
\item{y}{A numeric vector}
\item{xstd}{Either a numeric vector of the same length as \code{x}

giving the standard error for each of the elements, or a vector of length 2 giving the error formula.}
\item{ystd}{Either a numeric vector of the same length as \code{y} giving the standard error for each of the elements, or a vector of length 2 giving the error formula.}
  \item{jackknife}{Produce jackknife estimates of standard error.}
\item{dfbeta}{Return the dfbeta matrix}
\item{scale}{Compute an estimate of residual variance or scale.

If FALSE, the estimates of variance \code{xstd} and \code{ystd} are   assumed to be perfectly calibrated.}
}
\details{

The \code{xstd} specification can be a pair of values a, b; if so then the standard deviation of \code{x} is assumed to be \code{a + b*x}; similarly for \code{ystd}. An assumption of constant variance (homoscedasticity) correponds to \code{b=0}. If \code{b} is 0 for both \code{x} and \code{y}, then the result depends only on the
ratio of the \code{a} values, which is the ratio of the variances. To fit a Deming regression for instance use \code{c(1,0)} for both specifications. (Use of (k,0) for both would give the same answer for any value k).

When \code{a} is zero this is a model assuming constant coefficient of variation.
Values of stdx= (0,k) and stdy = (0,1) correspond to the case of contant proportional errors discussed by Linnet.

The most realistic case is where both \code{a} and \code{b} are non-zero and have been estimated from prior data. }
\value{

If \code{jackknife} is FALSE the result is a vector containing the intercept and the slope, otherwise it is a list with components:   \item{coefficient}{The coefficient vector, containing the intercept and slope.}
\item{variance}{The jackknife estimate of variance}   \item{dfbeta}{Optionally, the dfbeta residuals. A 2 column matrix, each row is the change in the coefficient vector if that observation is removed from the data.}
}
\details{

The standard printout includes test of intercept=0 and of slope=1.}
\references{

BD Ripley and M Thompson, Regression techniques for the detection of analytical bias, Analyst 112:377-383, 1987.

K Linnet, Estimation of the linear relationship between the measurements of two methods with proportional errors. Statistics in Medicine 9:1463-1473, 1990. }
\author{Terry Therneau}
\examples{

# Data from Ripley
arsenic <- data.frame(

      x=c(871, 701, 328, 560, 155, 175, 73, 366, 90, 939, 439, 369, 34, 194,
207, 138, 181, 127, 82, 188, 566, 0, 0, 40, 0, 198, 1021, 464, 566,
1925)/100,
y=c(735, 792, 340, 544, 207, 229, 66, 343, 125, 658, 331, 272, 232, 150,
350, 117, 231, 188, 44, 137, 704, 0, 49, 129, 37, 216, 1253, 390,
466, 1586)/100,
se.x=c(192, 156, 76, 126, 39, 43, 22, 84, 25, 207, 100, 84, 13, 47,
50, 36, 45, 33, 23, 46, 127, 6, 6, 15, 6, 48, 224, 105, 127, 418)
/100,
se.y=c(207, 223, 96, 153, 59, 65, 19, 97, 36, 185, 93, 77, 66, 43, 99,
33, 66, 54, 13, 40, 198, 1, 15, 37, 12, 62, 351, 110, 131, 445)/
100)



fit <- deming(arsenic$x, arsenic$y, arsenic$se.x, arsenic$se.y, dfbeta=T) print(fit)
\dontrun{

Coef se(coef) z p Intercept 0.1064478 0.2477054 0.54552512 0.3101551 Slope 0.9729999 0.1429776 -0.07341776 0.3874562

Scale= 1.358379
}
plot(1:30, fit$dfbeta[,1]) #subject 22 has a major effect on the fit # Standard Deming regression. The plot is not at all horizontal, which shows # that this is an inappropriate model, however plot(arsenic$x, arsenic$se.x) fit <- deming(arsenic$x, arsenic$y, xstd=c(1,0), ystd=c(1,0)) } } \keyword{regression} # Generalized Deming regression, based on Ripley, Analyst, 1987:377-383. # deming <- function(x, y, xstd, ystd, jackknife=TRUE, dfbeta=FALSE,  scale=TRUE) {  Call <- match.call() n <- length(x) if (length(y) !=n) stop("x and y must be the same length") if (length(xstd) != length(ystd)) stop("xstd and ystd must be the same length") # Do missing value processing nafun <- get(options()$na.action)
if (length(xstd)==n) {

        tdata <- nafun(data.frame(x=x, y=y, xstd=xstd, ystd=ystd))
x <- tdata$x y <- tdata$y
xstd <- tdata$xstd ystd <- tdata$ystd
}
else {
tdata <- nafun(data.frame(x=x, y=y))
x <- tdata$x y <- tdata$y
if (length(xstd) !=2) stop("Wrong length for std specification")
xstd <- xstd[1] + xstd[2]*x
ystd <- ystd[1] + ystd[2] * y
}



if (any(xstd <=0) || any(ystd <=0)) stop("Std must be positive")

minfun <- function(beta, x, y, xv, yv) {

        w <- 1/(yv + beta^2*xv)
alphahat <- sum(w * (y - beta*x))/ sum(w)
sum(w*(y-(alphahat + beta*x))^2)
}

minfun0 <- function(beta, x, y, xv, yv) {
w <- 1/(yv + beta^2*xv)
alphahat <- 0  #constrain to zero
sum(w*(y-(alphahat + beta*x))^2)
}

afun <-function(beta, x, y, xv, yv) {
w <- 1/(yv + beta^2*xv)
sum(w * (y - beta*x))/ sum(w)
}



fit <- optimize(minfun, c(.1, 10), x=x, y=y, xv=xstd^2, yv=ystd^2)     coef = c(intercept=afun(fit$minimum, x, y, xstd^2, ystd^2), slope=fit$minimum)
fit0 <- optimize(minfun0, coef[2]*c(.5, 1.5), x=x, y=y,

                     xv=xstd^2, yv=ystd^2)



w <- 1/(ystd^2 + (coef[2]*xstd)^2) #weights     u <- w*(ystd^2*x + xstd^2*coef[2]*(y-coef[1])) #imputed "true" value     if (is.logical(scale) && scale) {

        err1 <- (x-u)/ xstd
err2 <- (y - (coef[1] + coef[2]*u))/ystd
sigma <- sum(err1^2 + err2^2)/(n-2)
# Ripley's paper has err = [y - (a + b*x)] * sqrt(w); gives the same SS
}


else sigma <- scale^2

test1 <- (coef[2] -1)*sqrt(sum(w *(x-u)^2)/sigma) #test for beta=1     test2 <- coef[1]*sqrt(sum(w*x^2)/sum(w*(x-u)^2) /sigma) #test for a=0

rlist <- list(coefficient=coef, test1=test1, test0=test2, scale=sigma,

                  err1=err1, err2=err2, u=u)

if (jackknife) {
delta <- matrix(0., nrow=n, ncol=2)
for (i in 1:n) {
fit <- optimize(minfun, c(.5, 1.5)*coef[2],
x=x[-i], y=y[-i], xv=xstd[-i]^2, yv=ystd[-i]^2)
ahat <- afun(fit$minimum, x[-i], y[-i], xstd[-i]^2, ystd[-i]^2) delta[i,] <- coef - c(ahat, fit$minimum)
}
rlist$variance <- t(delta) %*% delta if (dfbeta) rlist$dfbeta <- delta
}



rlist$call <- Call class(rlist) <- 'deming' rlist } print.deming <- function(x, ...) { cat("\nCall:\n", deparse(x$call), "\n\n", sep = "")     if (is.null(x$variance)) {  table <- matrix(0., nrow=2, ncol=3) table[,1] <- x$coefficient
table[,2] <- c(x$test0, x$test1)
table[,3] <- pnorm(-2*abs(table[,2]))
dimnames(table) <- list(c("Intercept", "Slope"),
c("Coef", "z", "p"))
}
else {
table <- matrix(0., nrow=2, ncol=4)
table[,1] <- x$coefficient table[,2] <- sqrt(diag(x$variance))
table[,3] <- c(x$test0, x$test1)
table[,4] <- pnorm(-2*abs(table[,3]))
dimnames(table) <- list(c("Intercept", "Slope"),
c("Coef", "se(coef)", "z", "p"))
}


print(table, ...)
cat("\n Scale=", format(x\$scale, ...), "\n")     invisible(x)
}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 11 Feb 2010 - 14:07:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 11 Feb 2010 - 14:10:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.