Re: [R] Using $ accessor in GAM formula

From: Gene Leynes <gleynes_at_gmail.com>
Date: Tue, 10 May 2011 09:50:23 -0500

Gavin,

Yes, I thought I was responding to Berwin, whoops!

Since I'm often mixing different data frames, I rarely use this formulation: gam(y ~ s(x), dat)

Actually, I often forget about that syntax, and it's a good reminder. It really is much easier to read. In my "real" model that people actually use, I do use the formulation that you recommend.

Thank you though. I believe that syntax makes a huge difference in readability and long term error reduction.

Gene

On Fri, May 6, 2011 at 12:18 PM, Gavin Simpson <gavin.simpson_at_ucl.ac.uk>wrote:

> On Fri, 2011-05-06 at 11:20 -0500, Gene Leynes wrote:
> > Hmmm....
> >
> > After reading that email four times, I think I see what you mean.
> >
> > Checking for variables within particular scopes is probably one of the
> most
> > challenging things in R, and I would guess in other languages too. In R
> > it's compounded by situations when you're writing a function to accept
> > variables as either a stand alone global variable, or as an element of a
> > data.frame or list.
>
> Dear Gene,
>
> > I think this is a new problem, and I'll switch to the lengthier
> > data.frame[,'x'] syntax in gam for now.
>
> No, No, No, No, No!!!!!
>
> If there is one thing you **should** take from this thread is that there
> is no need to perform subsetting like that in a model formula.
>
> Why would you want (or prefer) to do:
>
> gam(dat$y ~ s(dat$x))
>
> or
>
> gam(dat[, "y"] ~ s(dat[, "x"]))
>
> when
>
> gam(y ~ s(x), dat)
>
> will suffice?
>
> > By the way, about the $ accessors. I can see why some people don't like
> > them, but they are a part of the language. And, I think they're a good
> > part. They make the code much more readable, and I have yet to make a
> > mistake using them (which makes me think that they can be used
> > responsibly). Making code harder to read is its own source of error, and
> is
> > not something that I take lightly!
>
> If you use R's formula notation properly, you'll get cleaner code than
> either of your suggestions.
>
> > Thanks for the replies. And, thank you Rolf for the detailed analysis.
> Do
> > you think that your or I should submit a bug report to the package
> > maintainer? I'm not sure how that works. Very few of my challenges turn
> > out to be actual bugs, but I think this one is.
>
> I would consider this a bug - but in the sense that Simon didn't foresee
> the strange formulas that users of his software might concoct.
>
> By the way, I think you perhaps meant Berwin (re the detailed analysis)?
>
> HTH
>
> G
>
> > On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach
> > <Berwin.Turlach_at_gmail.com>wrote:
> >
> > > G'day Rolf,
> > >
> > > On Fri, 06 May 2011 09:58:50 +1200
> > > Rolf Turner <rolf.turner_at_xtra.co.nz> wrote:
> > >
> > > > but it's strange that the dodgey code throws an error with gam(dat1$y
> > > > ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s))
> > >
> > > > Something a bit subtle is going on; it would be nice to be able to
> > > > understand it.
> > >
> > > Well,
> > >
> > > R> traceback()
> > > 3: eval(expr, envir, enclos)
> > > 2: eval(inp, data, parent.frame())
> > > 1: gam(dat$y ~ s(dat$x))
> > >
> > > So the lines leading up to the problem seem to be the following from
> > > the gam() function:
> > >
> > > vars <- all.vars(gp$fake.formula[-2])
> > > inp <- parse(text = paste("list(", paste(vars, collapse = ","),
> > > ")"))
> > > if (!is.list(data) && !is.data.frame(data))
> > > data <- as.data.frame(data)
> > >
> > >
> > >
> > > Setting
> > >
> > > R> options(error=recover)
> > >
> > > running the code until the error occurs, and then examining the frame
> > > number for the gam() call shows that "inp" is
> > > "expression(list( dat1,x ))" in your first example and
> > > "expression(list( dat2,s ))" in your second example. In both
> > > examples, "data" is "list()" (not unsurprisingly). When,
> > >
> > > dl <- eval(inp, data, parent.frame())
> > >
> > > is executed, it tries to eval "inp", in both cases "dat1" and "dat2"
> > > are found, obviously, in the parent frame. In your first example "x"
> is
> > > (typically) not found and an error is thrown, in your second example an
> > > object with name "s" is found in "package:mgcv" and the call to eval
> > > succeeds. "dl" becomes a list with two components, the first being,
> > > respectively, "dat1" or "dat2", and the second the body of the function
> > > "s". (To verify that, you should probably issue the command
> > > "debug(gam)" and step through those first few lines of the function
> > > until you reach the above command.)
> > >
> > > The corollary is that you can use the name of any object that R will
> > > find in the parent frame, if it is another data set, then that data
> > > set will become the second component of "inp". E.g.:
> > >
> > > R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
> > > R> gam(dat$cf ~ s(dat$min))
> > >
> > > Family: gaussian
> > > Link function: identity
> > >
> > > Formula:
> > > dat$cf ~ s(dat$min)
> > >
> > > Estimated degrees of freedom:
> > > 3.8925 total = 4.892488
> > >
> > > GCV score: 0.002704789
> > >
> > > Or
> > >
> > > R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
> > > R> gam(dat$cf ~ s(dat$BOD))
> > >
> > > Family: gaussian
> > > Link function: identity
> > >
> > > Formula:
> > > dat$cf ~ s(dat$BOD)
> > >
> > > Estimated degrees of freedom:
> > > 3.9393 total = 4.939297
> > >
> > > GCV score: 0.002666985
> > >
> > > > Just out of pure academic interest. :-)
> > >
> > > Hope your academic curiosity is now satisfied. :)
> > >
> > > HTH.
> > >
> > > Cheers,
> > >
> > > Berwin
> > >
> > > ========================== Full address ============================
> > > A/Prof Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
> > > School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
> > > The University of Western Australia FAX : +61 (8) 6488 1028
> > > 35 Stirling Highway
> > > Crawley WA 6009 e-mail: Berwin.Turlach_at_gmail.com
> > > Australia http://www.maths.uwa.edu.au/~berwin
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 10 May 2011 - 15:52:00 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 10 May 2011 - 16:00:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive