Re: [R] Post-hoc tests in MASS using glm.nb

From: <Bill.Venables_at_csiro.au>
Date: Wed, 18 May 2011 09:15:43 +1000

PS I should have followed the example with one using with() for something that would often be done with attach(): Consider:

with(polyData, {
  plot(x, y, pch=".")
  o <- order(x)
  lines(x[o], eta[o], col = "red")
})

I use this kind of dodge a lot, too, but now you can mostly use data= arguments on the individual functions.

Bill Venables.

-----Original Message-----
From: Venables, Bill (CMIS, Dutton Park) Sent: Wednesday, 18 May 2011 9:07 AM
To: 'Bert Gunter'; 'Peter Ehlers'
Cc: 'R list'
Subject: RE: [R] Post-hoc tests in MASS using glm.nb

Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost).

The difference between with() and within() is as follows:

with(data, expr, ...)

allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast

within(data, expr, ...)

again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result.

I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project.

Here is a simple example using within() for something you cannot do in one step with transform():

polyData <- within(data.frame(x = runif(500)), {   x2 <- x^2
  x3 <- x*x2
  b <- runif(4)
  eta <- cbind(1,x,x2,x3) %*% b
  y <- eta + rnorm(x, sd = 0.5)
  rm(b)
})

check:

> str(polyData)

'data.frame': 500 obs. of 5 variables:

 $ x  : num  0.5185 0.185 0.5566 0.2467 0.0178 ...
 $ y  : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ...
 $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ...
 $ x3 : num  1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ...
 $ x2 : num  0.268811 0.034224 0.309802 0.060844 0.000315 ...

>

Bill Venables.

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM
To: Peter Ehlers
Cc: R list
Subject: Re: [R] Post-hoc tests in MASS using glm.nb

Folks:

> Only if the user hasn't yet been introduced to the with() function,
> which is linked to on the ?attach page.
>
> Note also this sentence from the ?attach page:
>  ".... attach can lead to confusion."
>
> I can't remember the last time I needed attach().
>
> Peter Ehlers

Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following.

The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases).

Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis.

I hope this clarifies the previous posters' comments.

Cheers,
Bert

>
> [... non-germane material snipped ...]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 17 May 2011 - 23:17:45 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 18 May 2011 - 00:00:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive