[Rd] loess prediction algorithm

From: <apjaworski_at_mmm.com>
Date: Wed, 25 Jul 2007 17:56:54 -0500

Hello,

I need help with the details of loess prediction algorithm. I would like to get it implemented as a part of a measurement system programmed in LabView. My job is provide a detailed description of the algorithm. This is a simple one-dimensional problem - smoothing an (x, y) data set.

I found quite a detailed description of the fitting procedure in the "white book". It is also described in great detail at the NIST site in the Engineering Statistics Handbook. It provides an example of Loess computations. I managed to reproduce their example exactly in R. At each data point I compute a weighted local linear fit using the number of points based of span. Then I predict the values from these local fits. This matches R "loess" predictions exactly.

The problem starts when I try to predict at x values not in the data set. The "white book" does not talk about predictions at all. In the NIST handbook in the "Final note on Loess Computations" they mention this type of predictions but just say that the same steps are used for predictions as for fitting.

When I try to use "the same steps" I get predictions that are quite different that the predictions obtained by fitting R loess model to a data set and running predict(<model object>, newdata=<grid of x values>). They match quite well at the lowest and highest ends of the x grid but in the middle are different and much less smooth. I can provide details but basically what I do to create the predictions at x0 is this: 1. I append c(x0, NA) to the data frame of (x, y) data. 2. I calculate abs(xi-x0), i.e., absolute deviations of the x values in the data set and a given x0 value.
3. I sort the data set according to these deviations. This way the first row has the (x0, NA) value.
4. I drop the first row.
5. I divide all the deviations by the m-th one, where m is the number of points used in local fitting - m = floor(n*span) where n is the number of points.
6. I calculate the "tricube" weights and assign 0's to the negative ones. This eliminates all the points except the m points of interest. 7. I fit a linear weighted regression using lm. 8. I predict y(x0) from this linear model. This is basically the same procedure I use to predict at the x values from the data set, except for point 4.

I got the R sources for loess but it looks to me like most of the work is done in a bunch of Fortran modules. They are very difficult to read and understand, especially since they handle multiple x values. A couple of things that worry me are parameters in loess.control such as surface and cell. They seem to have something to do with predictions but I do not account for them in my simple procedure above.

Could anyone shed a light on this problem? Any comment will be appreciated.

I apologize in advance if this should have been posted in r-help. I figured that I have a better chance asking people who read the r-devel group, since they are likely to know more details about inner workings of R.

Thanks in advance,

Andy



Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory

E-mail: apjaworski_at_mmm.com
Tel: (651) 733-6092
Fax: (651) 736-3122

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 25 Jul 2007 - 23:01:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 26 Jul 2007 - 16:36:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.