Re: [Rd] delete.response leaves response in attribute dataClasses

From: William Dunlap <wdunlap_at_tibco.com>
Date: Thu, 05 Jan 2012 21:15:33 +0000

My feeling that everyone would index dataClasses by name was wrong. I looked through the packages that used dataClasses and saw code that would break if the first (response) entry were omitted. (I didn't check to see if passing the output of delete.response to these functions would be appropriate.) E.g.,
file: AICcmodavg/R/predictSE.mer.r
  ##matrix with info on factors
  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]

  ##continue if factors
  if(any(fact.frame == "factor")) {
    id.factors <- which(fact.frame == "factor")     fact.name <- names(fact.frame)[id.factors] #identify the rows for factors

Some packages create a dataClass attribute for a model.frame (not its terms attribute) that does not have any names: file: caper/R/macrocaic.R

   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2]) .checkMFClasses() does not throw an error for that, but it doesn't do any real checking either.

Most users of dataClasses do pass it to .checkMFClasses() to compare it with newdata and that doesn't care if you have extra entries in dataClasses.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-devel-bounces@r-project.org [mailto:r-devel-bounces@r-project.org] On Behalf Of William Dunlap
> Sent: Thursday, January 05, 2012 12:57 PM
> To: Paul Johnson; R Devel List
> Subject: Re: [Rd] delete.response leaves response in attribute dataClasses
>
> I had noticed the same thing but figured that most
> people (writers of predict methods) would be looking
> up entries in dataClasses by name and not by position,
> since predict's newdata argument need not have entries
> in the same order as the data used to fit the model.
> Hence the extra entry would not noticed (nor would it be
> missed if it were omitted).
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-devel-bounces_at_r-project.org [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Paul Johnson
> > Sent: Thursday, January 05, 2012 12:27 PM
> > To: R Devel List
> > Subject: [Rd] delete.response leaves response in attribute dataClasses
> >
> > I posted this one as an R bug
> > (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), but
> > Prof. Ripley says I'm premature, and I should raise the question here.
> >
> > Here's the behavior I assert is a bug:
> > The output from delete.response on a terms object alters the formula
> > by removing the dependent variable. It removes the response from the
> > "variables" attribute and it changes the response attribute from 1 to
> > 0. The response is removed from "predvars"
> >
> > But it leaves the name of the dependent variable first in the in
> > "dataClasses". It caused an unexpected behavior in my code, so (as
> > usual) the bug may be mine, but in my heart, I believe it belongs to
> > delete.response.
> >
> > To illustrate, here's a terms object from a regression.
> >
> > > tt
> > y ~ x1 * x2 + x3 + x4
> > attr(,"variables")
> > list(y, x1, x2, x3, x4)
> > attr(,"factors")
> > x1 x2 x3 x4 x1:x2
> > y 0 0 0 0 0
> > x1 1 0 0 0 1
> > x2 0 1 0 0 1
> > x3 0 0 1 0 0
> > x4 0 0 0 1 0
> > attr(,"term.labels")
> > [1] "x1" "x2" "x3" "x4" "x1:x2"
> > attr(,"order")
> > [1] 1 1 1 1 2
> > attr(,"intercept")
> > [1] 1
> > attr(,"response")
> > [1] 1
> > attr(,".Environment")
> > <environment: R_GlobalEnv>
> > attr(,"predvars")
> > list(y, x1, x2, x3, x4)
> > attr(,"dataClasses")
> > y x1 x2 x3 x4
> > "numeric" "numeric" "numeric" "numeric" "numeric"
> >
> > Now observe that delete.response removes the response from all
> > attributes except dataClasses.
> >
> > > delete.response(tt)
> > ~x1 * x2 + x3 + x4
> > attr(,"variables")
> > list(x1, x2, x3, x4)
> > attr(,"factors")
> > x1 x2 x3 x4 x1:x2
> > x1 1 0 0 0 1
> > x2 0 1 0 0 1
> > x3 0 0 1 0 0
> > x4 0 0 0 1 0
> > attr(,"term.labels")
> > [1] "x1" "x2" "x3" "x4" "x1:x2"
> > attr(,"order")
> > [1] 1 1 1 1 2
> > attr(,"intercept")
> > [1] 1
> > attr(,"response")
> > [1] 0
> > attr(,".Environment")
> > <environment: R_GlobalEnv>
> > attr(,"predvars")
> > list(x1, x2, x3, x4)
> > attr(,"dataClasses")
> > y x1 x2 x3 x4
> > "numeric" "numeric" "numeric" "numeric" "numeric"
> >
> >
> > pj
> >
> > --
> > Paul E. Johnson
> > Professor, Political Science
> > 1541 Lilac Lane, Room 504
> > University of Kansas
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 05 Jan 2012 - 21:21:41 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 Jan 2012 - 19:50:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive