[R] Fun and games with lowess()

From: Ross Clement <R.P.Clement_at_westminster.ac.uk>
Date: Wed 02 Mar 2005 - 08:52:55 EST

This is not a "help" request, just an experience with R that I found amusing.

I have a machine learning module that I teach which was originally all symbolic, but has a slowly growing numeric/statistical component.

Today I taught a two part lecture on instance based methods for learning, with the first half being simple KNN, then kernel methods for regression. The second half was using case-based reasoning to learn machine translation from human training.

Anyhow, here's the relevant bit. In the tutorial I got one of the students to fire up R so that we could use a normal pdf as the kernel function for a worked example. I ended up that part of the tutorial showing them the board data (just three points) plotted on a graph, and then plotting the predicted (according to board calculations) y value for the "unseen" x value as per the board.

Then since the tutorial had ended, I thought I'd get a little bit fancy. I created some sample data where the x values were uniform random between 0 and 8, and the y value was x^2 plus some gaussian noise. I then plotted the raw data, and a lowess() curve, explaining to the students that the lowess() curve was a more sophisticated method than we'd covered in class (I had mentioned locally weighted regression during the lecture). Finally I plotted the true y=x^2 line so that we could compare the lowess() curve to the true curve. The students were all very impressed at the close fit.

So, I then decided to show what happens when the noise gets really large. So, I recreated my x^2 + noise sample with large amounts of noise. Plot the lowess() curve and ... it's still pretty well bang on. So, increase the noise even further, plot the lowess() curve and the true curve and ... still very accurate. OK, quick explanation that the random Gaussian noise is more or less evenly distributed either side of the true line. So, I create some Gaussian noise with mean of 0, squared it, subtracted the first constant I thought of, and added it to the x^2 values to create the fake "observed" y values. Plot the data, and you can see the lower part of the curve clearly, but really random looking noise above it. Plot the lowess() curve, plot the true curve and ... pretty well bang on.

At this point I'd run out of time, and hence couldn't think of even tougher tests. The students were all fairly impressed on this. I saw several of them copying R (whole windows directory) off the hard drive to take home. Others asked if I could write them some notes on lowess() etc.

Once again I learn not to do things on the fly in lectures :-)



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Mar 02 06:03:40 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:39 EST