Re: [R] A Tip: lm, glm, and retained cases

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Tue, 26 Aug 2008 19:56:42 -0500

on 08/26/2008 07:31 PM (Ted Harding) wrote:
> On 26-Aug-08 23:49:37, hadley wickham wrote:

>> On Tue, Aug 26, 2008 at 6:45 PM, Ted Harding
>> <Ted.Harding_at_manchester.ac.uk> wrote:
>>> Hi Folks,
>>> This tip is probably lurking somewhere already, but I've just
>>> discovered it the hard way, so it is probably worth passing
>>> on for the benefit of those who might otherwise hack their
>>> way along the same path.
>>>
>>> Say (for example) you want to do a logistic regression of a
>>> binary response Y on variables X1, X2, X3, X4:
>>>
>>>  GLM <- glm(Y ~ X1 + X2 + X3 + X4)
>>>
>>> Say there are 1000 cases in the data. Because of missing values
>>> (NAs) in the variables, the number of complete cases retained
>>> for the regression is, say, 600. glm() does this automatically.
>>>
>>> QUESTION: Which cases are they?
>>>
>>> You can of course find out "by hand" on the lines of
>>>
>>>  ix <- which( (!is.na(Y))&(!is.na(X1))&...&(!is.na(X4)) )
>>>
>>> but one feels that GLM already knows -- so how to get it to talk?
>>>
>>> ANSWER: (e.g.)
>>>
>>>  ix <- as.integer(names(GLM$fit))
>> Alternatively, you can use:
>>
>> attr(GLM$model, "na.action")
>>
>> Hadley

>
> Thanks! I can see that it works -- though understanding how
> requires a deeper knowledge of "R internals". However, since
> you've approached it from that direction, simply
>
> GLM$model
>
> is a dataframe of the retained cases (with corresponding
> row-names), all variables at once, and that is possibly an
> even simpler approach!

Or just use:

   model.frame(ModelObject)

as the extractor function... :-)

Another 'a priori' approach would be to use na.omit() or one of its brethren on the data frame before creating the model. Which function is used depends upon how 'na.action' is set.

The returned value, or more specifically the 'na.action' attribute as appropriate, would yield information similar to Hadley's approach relative to which records were excluded.

For example, using the simple data frame in ?na.omit:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA))

> DF

  x y
1 1 0
2 2 10
3 3 NA

DF.na <- na.omit(DF)

> DF.na

  x y
1 1 0
2 2 10

> attr(DF.na, "na.action")

3
3
attr(,"class")
[1] "omit"

So you can see that record 3 was removed from the original data frame due to the NA for 'y'.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 27 Aug 2008 - 01:02:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Aug 2008 - 12:34:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive