From: Marc Schwartz <marc_schwartz_at_comcast.net>

Date: Fri, 15 Jun 2007 18:21:33 -0500

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Jun 2007 - 23:30:31 GMT

Date: Fri, 15 Jun 2007 18:21:33 -0500

On Fri, 2007-06-15 at 15:34 -0500, Dirk Eddelbuettel wrote:

> Hi Mark,

*>
**> Thanks for the reply.
**>
**> On 15 June 2007 at 14:33, Marc Schwartz wrote:
**> | On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote:
**> | > Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
**> | > In short, the issue has to do with how rpart evaluates a formula and
**> | > supporting arguments, in particular 'weights'.
**> | >
**> | > A simple contrived example is
**> | >
**> | > -----------------------------------------------------------------------------
**> | > library(rpart)
**> | >
**> | > ## using data from help(rpart), set up simple example
**> | > myformula <- formula(Kyphosis ~ Age + Number + Start)
**> | > mydata <- kyphosis
**> | > myweight <- abs(rnorm(nrow(mydata)))
**> | >
**> | > goodFunction <- function(mydata, myformula, myweight) {
**> | > hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
**> | > prev <- hyp
**> | > }
**> | > goodFunction(mydata, myformula, myweight)
**> | > cat("Ok\n")
**> | >
*

<snip>

*> |
*

> | However, now let's do this:

*> |
**> |
**> | library(rpart)
**> | myformula <- formula(Kyphosis ~ Age + Number + Start)
**> | mydata <- kyphosis
**> | myweight <- abs(rnorm(nrow(mydata)))
**> |
**> | goodFunction <- function(mydata, myformula) {
**> | hyp <- rpart(myformula, data=mydata,
**> | weights=myweight, method="class")
**> | prev <- hyp
**> | }
**> |
**> | > goodFunction(mydata, myformula)
**> | >
**> |
**> | It works, because 'myweight' is found in the global environment, which
**> | is where the formula is created.
**>
**> Well,yes, but doesn't this just recreate the working example I showed above?
**> It works 'because we get lucky' with the data in the outer global env.
*

Technically, it is not the same, as I was trying to emphasize that there was no need to pass 'myweight' as an argument to the function to facilitate successful location/evaluation within the function.

We don't get lucky here. The behavior is by design and consistent with the documentation, which is that 'myweight' in the call to rpart() is evaluated within the environment of the formula in this case. The formula is created in the global environment, so 'myweight' is found there. Hence, no need to pass it as an argument.

A review of the code for rpart() will reveal code similar to that which is used in most R modeling functions, relative to the evaluation of the formula, associated args and the creation of the model frame.

One exception to the above, is that in other modeling functions, one could forgo passing the formula and just pass the entire data frame, where the presumption is that the first column is the response variable and the remaining columns would be the independent terms. I don't see that supported in rpart().

*>
*

> | Now, final example, try this:

*> |
**> |
**> | library(rpart)
**> | goodFunction <- function() {
**> | myformula <- formula(Kyphosis ~ Age + Number +
**> | Start)
**> | mydata <- kyphosis
**> | myweight <- abs(rnorm(nrow(mydata)))
**> |
**> | hyp <- rpart(myformula, data=mydata,
**> | weights=myweight, method="class")
**> | prev <- hyp
**> | }
**> |
**> | > goodFunction()
**> | >
**> |
**> | It works because the formula is created within the environment of the
**> | function and hence, 'myweight', which is created there as well, is
**> | found.
**>
**> That works because we force it to be local. BDR claims that my 'badFunction'
**> (derived from Philipp's original bug report) above can be made to work
**> provide you use model.frame. I asked about model.frame -- and you were kind
**> enough do answer, but you dodged the question.
**>
**> So let me try again: How can rpart be called inside a function using a
**> local weight variable as I do above ? Either it can, and the BDR is right
**> and there is no bug, or one cannot, and then mere mortals like myself must
**> consider rpart to be buggy as it does not support all its argument in at
**> least some conceivable calling situations.
**>
**> Is that a fair question?
**>
**> Regards, Dirk
*

Yep, entirely fair.

Without knowing what specific approach Prof. Ripley had in mind, I am envisioning a couple of possibilities, but here is one:

library(rpart)

myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis

badFunction <- function(mydata, myformula) { mydata$myweight <- abs(rnorm(nrow(mydata))) rpart(myformula, data = mydata, weights = myweight, method = "class") }

badFunction(mydata, myformula)

Basically, there are 3 places in which 'myweights' could be found:

- Formula environment
- Data frame environment
- Global environment

In this case, we add the weights as a new column within the function to the 'mydata' data frame, so that it will be found in the call to rpart(), based upon location number 2 above.

Does that help?

Regards,

Marc

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Jun 2007 - 23:30:31 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 15 Jun 2007 - 23:32:04 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*