Re: [R] predict nbinomial glm

From: Sundar Dorai-Raj <sundar.dorai-raj_at_pdf.com>
Date: Wed 17 Aug 2005 - 00:33:47 EST

Katharina,

I agree with Prof. Ripley's assessment. But, perhaps one thing you may have overlooked is that subset.data.frame does not remove unused levels. So,

 > subset_of_dataframe = subset(data_frame, (b > 80 & c < 190))  > levels(subset_of_dataframe$d)
[1] "q" "r" "s" "t"
 > table(subset_of_dataframe$d)
  q r s t
  0 20 50 10

Even though the level "q" does not appear it is still a level of "d". Perhaps you need to do the following after the subset:

subset_of_dataframe[] <- lapply(subset_of_dataframe, "[", drop = TRUE)

which drops all unused levels from factors.

I'm not sure if your problem is statistical in nature or simply a misunderstanding of the software. I'm only attempting to answer the latter. As Prof. Ripley suggests, discuss any statistical problem (i.e. predicting on missing levels) with your advisor.

HTH, --sundar

P.S. Also, update R. It's free.

Prof Brian Ripley wrote:

> This is seems to be an unstated repeat of much of an earlier and 
> unanswered post
> 
>  	https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html
> 
> entitled
> 
>  	[R] error in predict glm (new levels cause problems)
> 
> It is nothing to do with `nbinomial glm' (sic): all model fitting 
> functions including lm and glm do this.  The reason you did not get at 
> least one reply from your first post is that you seemed not to have done 
> your homework.  (One thing the posting guide does ask is for you to try 
> the current version of R, and yours is three versions old.)
> 
> The code is protecting you from an attempt at statistical nonsense. 
> (Indeed, the check was added to catch such misuses.)  Your email address 
> seems to be that of a student, so please seek the help of your advisor. 
> You seem surprised that you are not allowed to make predictions about 
> levels for which you have supplied no relevant data.
> 
> 
> On Tue, 16 Aug 2005, K. Steinmann wrote:
> 
> 

>>Dear R-helpers,
>>
>>let us assume, that I have the following dataset:
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>data_frame <- data.frame(a,b,c,d)
>>
>>In a first step I run a glm.nb (full code is given at the end of this mail) and
>>want to predict my response variable a.
>>In a second step, I would like to run a glm.nb based on a subset of the
>>data_frame. As soon as I want to predict the response variable a, I get the
>>following error message:
>>"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
>>object$xlevels) :
>> factor d has new level(s) q"
>>
>>Does anybody have a solution to this problem?
>>
>>Thank you in advance,
>>K. Steinmann (working with R 2.0.0)
>>
>>
>>Code:
>>
>>library(MASS)
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>
>>data_frame <- data.frame(a,b,c,d)
>>
>>model_1 = glm.nb(a ~ b + d , data = data_frame)
>>
>>pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
>>= FALSE, dispersion = NULL, terms = NULL)
>>
>>subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
>>
>>model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
>>pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
>>"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
>
>

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Aug 17 00:38:47 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:23:33 EST