Re: [R] Using $ accessor in GAM formula

From: Berwin A Turlach <Berwin.Turlach_at_gmail.com>
Date: Fri, 06 May 2011 17:53:33 +0800

G'day Rolf,

On Fri, 06 May 2011 09:58:50 +1200
Rolf Turner <rolf.turner_at_xtra.co.nz> wrote:

> but it's strange that the dodgey code throws an error with gam(dat1$y
> ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s))

> Something a bit subtle is going on; it would be nice to be able to
> understand it.

Well,

R> traceback()

3: eval(expr, envir, enclos)
2: eval(inp, data, parent.frame())
1: gam(dat$y ~ s(dat$x))

So the lines leading up to the problem seem to be the following from the gam() function:

        vars <- all.vars(gp$fake.formula[-2])
        inp <- parse(text = paste("list(", paste(vars, collapse = ","), 
            ")"))
        if (!is.list(data) && !is.data.frame(data)) 
            data <- as.data.frame(data)
        


Setting

R> options(error=recover)

running the code until the error occurs, and then examining the frame number for the gam() call shows that "inp" is

"expression(list( dat1,x ))" in your first example and
"expression(list( dat2,s ))" in your second example.  In both
examples, "data" is "list()" (not unsurprisingly).  When, 

	dl <- eval(inp, data, parent.frame())

is executed, it tries to eval "inp", in both cases "dat1" and "dat2" are found, obviously, in the parent frame. In your first example "x" is (typically) not found and an error is thrown, in your second example an object with name "s" is found in "package:mgcv" and the call to eval succeeds. "dl" becomes a list with two components, the first being, respectively, "dat1" or "dat2", and the second the body of the function "s". (To verify that, you should probably issue the command "debug(gam)" and step through those first few lines of the function until you reach the above command.)

The corollary is that you can use the name of any object that R will find in the parent frame, if it is another data set, then that data set will become the second component of "inp". E.g.:

R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05)) R> gam(dat$cf ~ s(dat$min))

Family: gaussian
Link function: identity

Formula:
dat$cf ~ s(dat$min)

Estimated degrees of freedom:
3.8925 total = 4.892488

GCV score: 0.002704789

Or

R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05)) R> gam(dat$cf ~ s(dat$BOD))

Family: gaussian
Link function: identity

Formula:
dat$cf ~ s(dat$BOD)

Estimated degrees of freedom:
3.9393 total = 4.939297

GCV score: 0.002666985

> Just out of pure academic interest. :-)

Hope your academic curiosity is now satisfied. :)

HTH. Cheers,

        Berwin


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 May 2011 - 12:27:49 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 May 2011 - 17:20:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive