From: Marc Schwartz <marc_schwartz_at_comcast.net>

Date: Fri, 15 Jun 2007 14:33:24 -0500

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Jun 2007 - 19:50:07 GMT

Date: Fri, 15 Jun 2007 14:33:24 -0500

On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote:

> Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.

*> In short, the issue has to do with how rpart evaluates a formula and
**> supporting arguments, in particular 'weights'.
**>
**> A simple contrived example is
**>
**> -----------------------------------------------------------------------------
**> library(rpart)
**>
**> ## using data from help(rpart), set up simple example
**> myformula <- formula(Kyphosis ~ Age + Number + Start)
**> mydata <- kyphosis
**> myweight <- abs(rnorm(nrow(mydata)))
**>
**> goodFunction <- function(mydata, myformula, myweight) {
**> hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
**> prev <- hyp
**> }
**> goodFunction(mydata, myformula, myweight)
**> cat("Ok\n")
**>
**> ## now remove myweight and try to compute it inside a function
**> rm(myweight)
**>
**> badFunction <- function(mydata, myformula) {
**> myweight <- abs(rnorm(nrow(mydata)))
**> mf <- model.frame(myformula, mydata, myweight)
**> print(head(df))
**> hyp <- rpart(myformula,
**> data=mf,
**> weights=myweight,
**> method="class")
**> prev <- hyp
**> }
**> badFunction(mydata, myformula)
**> cat("Done\n")
**> -----------------------------------------------------------------------------
**>
**> Here goodFunction works, but only because myweight (with useless random
**> weights, but that is not the point here) is found from the calling
**> environment.
**>
**> badFunction fails after we remove myweight from there:
**>
**> :~> cat /tmp/philipp.R | R --slave
**> Ok
**> Error in eval(expr, envir, enclos) : object "myweight" not found
**> Execution halted
**> :~>
**>
**> As I was able to replicate it, I reported this to the package maintainer. It
**> turns out that seemingly all is well as this is supposed to work this way,
**> and I got a friendly pointer to study model.frame and its help page.
**>
**> Now I am stuck as I can't make sense of model.frame -- see badFunction
**> above. I would greatly appreciate any help in making rpart work with a local
**> argument weights so that I can tell Philipp that there is no bug. :)
**>
**> Regards, Dirk
*

Dirk,

As you note, the issue is the non-standard evaluation of the arguments in model.frame() The key section of the Details in ?model.frame is:

All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame. Then the subset expression is evaluated, and it is is used as a row index to the data frame. Then the na.action function is applied to the data frame (and may well add attributes). The levels of any factors in the data frame are adjusted according to the drop.unused.levels and xlev arguments.

Note that even with your goodFunction(), if 'myweight' is created within the environment of the function and not in the global environment, it still fails:

library(rpart)

myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis

goodFunction <- function(mydata, myformula) {

myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }

> goodFunction(mydata, myformula)

Error in eval(expr, envir, enclos) : object "myweight" not found

However, now let's do this:

library(rpart)

myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis

myweight <- abs(rnorm(nrow(mydata)))

goodFunction <- function(mydata, myformula) {

hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }

> goodFunction(mydata, myformula)

*>
*

It works, because 'myweight' is found in the global environment, which is where the formula is created.

library(rpart)

goodFunction <- function() {

myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }

> goodFunction()

*>
*

It works because the formula is created within the environment of the function and hence, 'myweight', which is created there as well, is found.

There was a (non) bug filed on a related matter dealing with the evaluation of 'subset':

http://bugs.r-project.org/cgi-bin/R/feature%26FAQ?id=3671

and you might find this document on Non-Standard Evaluation helpful:

http://developer.r-project.org/nonstandard-eval.pdf

**HTH,
**
Marc

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Jun 2007 - 19:50:07 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 15 Jun 2007 - 22:32:22 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*