From: Sundar Dorai-Raj <sundar.dorai-raj_at_pdf.com>

Date: Wed 17 Aug 2005 - 00:33:47 EST

>>Dear R-helpers,

*>>
*

*>>let us assume, that I have the following dataset:
*

*>>
*

*>>a <- rnbinom(200, 1, 0.5)
*

*>>b <- (1:200)
*

*>>c <- (30:229)
*

*>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
*

*>>data_frame <- data.frame(a,b,c,d)
*

*>>
*

*>>In a first step I run a glm.nb (full code is given at the end of this mail) and
*

*>>want to predict my response variable a.
*

*>>In a second step, I would like to run a glm.nb based on a subset of the
*

*>>data_frame. As soon as I want to predict the response variable a, I get the
*

*>>following error message:
*

*>>"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
*

*>>object$xlevels) :
*

*>> factor d has new level(s) q"
*

*>>
*

*>>Does anybody have a solution to this problem?
*

*>>
*

*>>Thank you in advance,
*

*>>K. Steinmann (working with R 2.0.0)
*

*>>
*

*>>
*

*>>Code:
*

*>>
*

*>>library(MASS)
*

*>>
*

*>>a <- rnbinom(200, 1, 0.5)
*

*>>b <- (1:200)
*

*>>c <- (30:229)
*

*>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
*

*>>
*

*>>data_frame <- data.frame(a,b,c,d)
*

*>>
*

*>>model_1 = glm.nb(a ~ b + d , data = data_frame)
*

*>>
*

*>>pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
*

*>>= FALSE, dispersion = NULL, terms = NULL)
*

*>>
*

*>>subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
*

*>>
*

*>>model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
*

*>>pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
*

*>>"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
*

>

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Aug 17 00:38:47 2005

Date: Wed 17 Aug 2005 - 00:33:47 EST

Katharina,

I agree with Prof. Ripley's assessment. But, perhaps one thing you may have overlooked is that subset.data.frame does not remove unused levels. So,

> subset_of_dataframe = subset(data_frame, (b > 80 & c < 190))
> levels(subset_of_dataframe$d)

[1] "q" "r" "s" "t"

> table(subset_of_dataframe$d)

q r s t

0 20 50 10

Even though the level "q" does not appear it is still a level of "d". Perhaps you need to do the following after the subset:

subset_of_dataframe[] <- lapply(subset_of_dataframe, "[", drop = TRUE)

which drops all unused levels from factors.

I'm not sure if your problem is statistical in nature or simply a misunderstanding of the software. I'm only attempting to answer the latter. As Prof. Ripley suggests, discuss any statistical problem (i.e. predicting on missing levels) with your advisor.

**HTH,
**
--sundar

P.S. Also, update R. It's free.

Prof Brian Ripley wrote:

> This is seems to be an unstated repeat of much of an earlier and > unanswered post > > https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html > > entitled > > [R] error in predict glm (new levels cause problems) > > It is nothing to do with `nbinomial glm' (sic): all model fitting > functions including lm and glm do this. The reason you did not get at > least one reply from your first post is that you seemed not to have done > your homework. (One thing the posting guide does ask is for you to try > the current version of R, and yours is three versions old.) > > The code is protecting you from an attempt at statistical nonsense. > (Indeed, the check was added to catch such misuses.) Your email address > seems to be that of a student, so please seek the help of your advisor. > You seem surprised that you are not allowed to make predictions about > levels for which you have supplied no relevant data. > > > On Tue, 16 Aug 2005, K. Steinmann wrote: > >

>>Dear R-helpers,

>

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Aug 17 00:38:47 2005

*
This archive was generated by hypermail 2.1.8
: Sun 23 Oct 2005 - 15:23:33 EST
*