RE: [R] Help with predict.lm

From: Ted Harding <>
Date: Wed 20 Apr 2005 - 00:20:26 EST

On 19-Apr-05 Mike White wrote:
> Hi
> I have measured the UV absorbance (abs) of 10 solutions
> of a substance at known concentrations (conc) and have
> used a linear model to plot a calibration graph with
> confidence limits. I now want to predict the concentration
> of solutions with UV absorbance results given in the
> new.abs data.frame, however predict.lm only appears to work
> for new "conc" variables not new "abs" variables.
> I have search the help files and did find a similar problem
> in June 2000, but unfortunately no solution was offered.
> Any help and how to use predict.lm with the new "abs" data
> to predict "conc" with confidence limits would be appreciated.
> conc<-seq(100, 280, 20) # mg/l
> abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
> 1.852, 1.936,2.046) # absorbance units
> lm.calibration<-lm(abs ~ conc)
> pred.w.plim <- predict(lm.calibration, interval="prediction")
> pred.w.clim <- predict(lm.calibration, interval="confidence")
> matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
> lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
> points(conc, abs, pch=21, col="blue")
> new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> predict(calibration.lm, new.abs) # does not work

Apart from the apparent typo "calibration.lm" which Matthias pointed out, this will not work anyway since "predict" predicts in the same direction as the model was fitted in the first place.

You have fitted "abs ~ conc", so the lm object lm.calibration contains a representation of the model equivalent to

  abs = a + b*conc

When you use predict(calibration.lm, new.abs) it will expect the dataframe "new.abs" to contain values in a list named "conc". Since you have supplied a list named "abs", this is apparently ignored, and as a result the function uses the default dataframe which is the data encapsulated in the calibration.lm object.

You can verify this by comparing the outputs of

  predict(lm.calibration, new.abs)



You will see that they are identical!

Either way, "predict" predicts "ans" from "conc" by using the above equation. It does not attempt to invert the equation even if you call the "new" data "abs".

Given the apparently extremely close fit, in your data, to a straight line, you are likely to get perfectly adequate results simply by doing the regression the other way round, i.e.

> lm.calibration2<-lm(conc ~ abs)
> new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> predict(lm.calibration2,new.abs)

       1 2 3
131.3813 144.7453 168.1782

which looks about right!

Strictly speaking, this approach is somewhat invalid, since under the model

   abs = a + b*conc + error

the "inverse regression"

   conc = (abs - a)/b = A + B*abs

does not have the standard statistical properties because of the theoretical possiblity (P > 0) that the estimated b can be arbitrarily close to 0 (with the result that, theoretically, the estimate of B = 1/b has neither expectation nor variance), and the estimates of A and B could be a long way out.

Also, the confidence intervals you would get by using the residuals from this "inverse regression" in the usual way would not be valid, since they would be finite for all values of the confidence level. In reality, for a sufficiently high confidence level (but still short of 100%) you will get a confidence "interval" consisting of several parts with gaps between them, or an infinite confidence interval.

This takes place at confidence levels corresponding to the significance level at which you can no longer reject the hypothesis "b = 0" in the first euqation. In your case this significance level would be extremely small (I get 2.71e-12, so you can get confidence intervals up to at least 99.999999999% confidence level before you need to worry much about the confidence interval!

In your case where there seems to be a very tight linear relationship, the possible misrepresentation of the inverse relationship due to this effect is very unlikely to have occurred.

The correct approach has in the past been the subject of at times quite controversial discussion, under the title indeed of "The Calibration Problem". Nowadays this problem would be approached by making the concentrations to be "predicted" additional unknown parameters, and evaluating likelihood ratios for possible values of these.

I don't have time at the moment to go into this approach, but will try to write something later.

It seems there is nothing in R at present for this kind of general (and basically simple) use, though maybe there is something usable in the package mscalib (which I have not studied); however, by the look of the contents, the routines therein may be too elaborately specialised towards a specific type of application to be convenient for general use in conjunction with lm.

Till later,

 it might be
worth spelling it out for
people who would like to implement it themselves. Later.

E-Mail: (Ted Harding) <> Fax-to-email: +44 (0)870 094 0861
Date: 19-Apr-05                                       Time: 15:20:26
------------------------------ XFMail ------------------------------

______________________________________________ mailing list PLEASE do read the posting guide! Received on Wed Apr 20 00:44:00 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:16 EST