Re: [R] Problems with mars in R in the case of nonlinear functions

From: Stephen Milborrow <milbo_at_sonic.net>
Date: Sun, 13 Jul 2008 14:33:58 +0200


| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and it underfits very badly.

Try the "earth" package which extends the mars function in the mda package.

Your example becomes

library(earth) # was mda

f <- function(x,y) { x^2-y^2 }
x <- seq(-1,1,length=10)
x <- outer(x*0,x,FUN="+")
y <- t(x)
X <- cbind(as.vector(x),as.vector(y))
z <- f(x,y)

fit <- earth(X, as.vector(z))
summary(fit)
plotmo(fit) # note better fit than before # your original plotting code could be used too

For this kind of data, you could possibly use the minspan parameter. MARS by default does not allow every observation to be used as a knot in the generated basis functions. This strategyy increases resistance to runs of correlated noise in the data. For non-noisy data, you can set minspan=1 to allow MARS to consider
every observation as a potential knot. If your data were noisy then minspan=1 could overfit the data. With earth, you can use trace=2 to see the calculated minspan value.

If you run the above example with the earth parameter trace=1, you will see that the stopping condition for the forward pass is:

Reached delta RSq threshold (DeltaRSq 0.00030214 < 0.001)

To make the forward pass continue further, change the "delta RSq threshold" by using the thresh parameter:

fit <- earth(X, as.vector(z), thresh=1e-6)

The resulting model "looks" better when plotted, but note that using thresh here makes almost no change to the GRSq. That is, with the lower threshold the model is more complicated (has more terms) but does not have a greater predictive power. The threshold is just one of the reasons that the forward pass can terminate (reaching the the maximum number of terms nk is another). AFAIK Friedman's code (that you ran from Matlab) does not use the threshold but instead just continues forward stepping until nk is reached. In this case the Matlab model is arguably more complicated than it need be. I believe the forward threshhold for MARS was an innovation of Hastie and Tibshirani, but I could be wrong.

To reduce mailing list traffic, let's continue this discussion off-line i.e. by direct mail to each other, and if necessary I will summarize results of our discussions in the earth documentation.

Regards
Steve

| Message: 76
| Date: Thu, 12 Jun 2008 13:35:35 -0700
| From: Janne Huttunen <jmhuttun_at_stat.berkeley.edu>
| Subject: [R] Problems with mars in R in the case of nonlinear
| functions
| To:
| Message-ID: <48518897.7080804@stat.berkeley.edu>
| Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
| Hi,
|
| I'm trying to use mars function in R to interpolate nonlinear
| multivariate functions.
| However, it seems that mars gives me a fit which uses only very few
| basis function and
| it underfits very badly.
|
| For example, I have tried the following code to test mars:
|
| require("mda")
|
| f <- function(x,y) { x^2-y^2 };
| #f <- function(x,y) { x+2*y };
|
| # Grid
| x <- seq(-1,1,length=10);
| x <- outer(x*0,x,FUN="+"); y <- t(x);
| X <- cbind(as.vector(x),as.vector(y));
|
| # Data
| z <- f(x,y);
|
| fit <- mars(X,as.vector(z),nk=200,penalty=2,thresh=1e-3,degree=2);
|
| # Plotting
| par(mfrow=c(1,2),pty="s")
| lims <- c(min(c(min(z),min(fit$fitted))),max(c(max(z),max(fit$fitted))))
| persp(z=z,ticktype='detailed',col='lightblue',shade=.75,ltheta=50,
| xlab='x',ylab='y',zlab='z',main='true',phi=25,theta=55,zlim=lims)
|
persp(z=matrix(fit$fitted.values,nrow=nrow(x),byrow=F),ticktype='detailed',
| col='lightblue',
| xlab='x',ylab='y',zlab='z',shade=.75,ltheta=50,main='MARS',
| phi=25,theta=55,zlim=lims)
|
| (the code is also here if someone wants to try it:
| http://venda.uku.fi/~jmhuttun/R/marstest.R)
|
| The results are here: http://venda.uku.fi/~jmhuttun/R/R-10.pdf . The
| fitted model contains only
| 5 terms which is not enough in this case. Adjusting parameters like nk,
| thresh, penalty and degree
| seems only have minor effect or no effect at all. It's also strange that
| when I increase
| the number of points in the grid, the results are ever worse:
| see e.g. http://venda.uku.fi/~jmhuttun/R/R-20.pdf for a 20x20 grid.
| However Mars seems to work well with linear functions (e.g. with the
| function which
| is commented in the above code).
|
| Do anyone know what is wrong in this case? Do I miss something is there
| something
| wrong in my code?
|
| This seems not to be a problem with MARS method in general. For example,
| Friedman's MARS implementation (ran in Matlab) gives a rather good fit:
| see http://venda.uku.fi/~jmhuttun/R/Matlab.pdf .
|
| Thank you
|
| Janne
|
| --
| Janne Huttunen
| University of California
| Department of Statistics
| 367 Evans Hall Berlekey, CA 94720-3860
| email: jmhuttun_at_stat.berkeley.edu
| phone: +1-510-502-5205
| office room: 449 Evans Hall



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 Jun 2008 - 13:34:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 14:30:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive