From: Mike Dugas <mikedugas77_at_gmail.com>

Date: Thu, 24 Apr 2008 09:28:26 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 Apr 2008 - 14:21:25 GMT

Date: Thu, 24 Apr 2008 09:28:26 -0400

I have a couple questions about your code.
First, why not use

xs <- seq(min(x1), max(x1), length = 100) instead of
xs <- with(m, seq(min(x1), max(x1), length = 100)) ?
Second, what is the function geom_line()? I couldn't find it.

Thanks,

Mike

On 4/23/08, hadley wickham <h.wickham_at_gmail.com> wrote:

*>
**> On Wed, Apr 23, 2008 at 8:33 PM, hadley wickham <h.wickham_at_gmail.com>
**> wrote:
*

> > > Sure, I am creating a partial dependence plot (reference Friedman's

*> > > stochastic gradient paper from, I want to say, 2001). The idea is to
**> find
**> > > the relationship between one of the predictors, say x1, and y by
**> creating
**> > > the following plot: take a random sample of actual data points, hold
**> other
**> > > predictors fixed (x2-xp), vary x1 across its range, create a string of
**> >
**> > Put your code doesn't have a random component - you're trying to
**> > calculate everything combination of the new x_n and the existing data?
**> > Is that right?
**>
**> And why are you using so many different values of the x variable?
**> 100's should be sufficient to get a smooth curve, not thousands. I'd
**> also think about displaying not just the mean, but a selection of
**> quantiles as well:
**>
**> Here's one approach:
**>
**> model <- lm(y ~ poly(x1, 2) + x2, data = m)
**>
**>
**> xs <- with(m, seq(min(x1), max(x1), length = 100))
**>
**> library(reshape)
**> newdf <- expand.grid.df(data.frame(x1 = xs), m[, c("x2"), drop=F])
**>
**> predictions <- predict(model, newdata = newdf)
**> avg_pred <- tapply(predictions, newdf$x1, mean)
**> low_pred <- tapply(predictions, newdf$x1, quantile, 0.25)
**> high_pred <- tapply(predictions, newdf$x1, quantile, 0.75)
**>
**> library(ggplot)
**> qplot(xs, avg_pred, min = low_pred, max = high_pred, geom="ribbon") +
**> geom_line()
**>
**>
**> But following your code, it's exhaustive, not random. This should be
**> a little faster because all the predictions are done in one go.
**>
**> Hadley
**>
**> --
**> http://had.co.nz/
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 Apr 2008 - 14:21:25 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 24 Apr 2008 - 15:30:31 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*