Re: [Rd] problem using model.frame()

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Tue 16 Aug 2005 - 17:44:23 GMT

On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck wrote:
> On 8/16/05, Gavin Simpson <gavin.simpson@ucl.ac.uk> wrote:
> > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> > > It can handle data frames like this:
> > >
> > > model.frame(y1)
> > > or
> > > model.frame(~., y1)
> >
> > Thanks Gabor,
> >
> > Yes, I know that works, but I want the function coca.formula to accept a
> > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
>
> The expressions I gave work generally (i.e. lm, glm, ...), not just in
> model.matrix, so would it be ok if the user just does this?
>
> yourfunction(y2 ~., y1)

Thanks again Gabor for your comments,

I'd prefer the y1 ~ y2 as data frames - as this is the most natural way of doing things. I'd like to have (y2 ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also work - silently without any trouble.

> If it really is important to do it the way you describe, are the data
> frames necessarily numeric? If so you could preprocess your formula
> by placing as.matrix around all the variables representing data frames
> using something like this:
>
> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

Yes, they are numeric matrices (as data frames). I've looked at this, but I'd prefer to not have to do too much messing with the formula.

> Of course, if they are necessarily numeric maybe they can be matrices in
> the first place?

Because read.table etc. produce data.frames and this is the natural way to work with data in R.

Following your suggestions, I altered my code to evaluate the rhs of the formula and check if it was of class "data.frame". If it is then I stop processing and return it as a data.frame as this point. If not, it eventually gets passed on to model.frame() for it to deal with it.

So far - limited testing - it seems to do what I wanted all along. I'm sure there's a gotcha in there somewhere but at least the code runs so I can check for problems against my examples.

Right, back to writing documentation...

G

> > more intuitive, to my mind at least for this particular example and
> > analysis, to specify the formula with a data frame on the rhs.
> >
> > model.frame doesn't work with the formula "~ y1" if the object y1, in
> > the environment when model.frame evaluates the formula, is a data.frame.
> > It works if y1 is a matrix, however. I'd like to work around this
> > problem, say by creating an environment in which y1 is modified to be a
> > matrix, if possible. Can this be done?
> >
> > At the moment I have something working by grabbing the bits of the
> > formula and then using get() to grab the named object. Of course, this
> > won't work if someone wants to use R's formula interface with the
> > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> > subset argument common to many formula implementations. I'd like to have
> > the function work in as general a manner as possible, so I'm fishing
> > around for potential solutions.
> >
> > All the best,
> >
> > Gav
> >
> > >
> > > On 8/16/05, Gavin Simpson <gavin.simpson@ucl.ac.uk> wrote:
> > > > Hi I'm having a problem with model.frame, encapsulated in this example:
> > > >
> > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > > > nrow = 5, byrow = TRUE)
> > > > y1 <- as.data.frame(y1)
> > > > rownames(y1) <- paste("site", 1:5, sep = "")
> > > > colnames(y1) <- paste("spp", 1:4, sep = "")
> > > > y1
> > > >
> > > > model.frame(~ y1)
> > > > Error in model.frame(formula, rownames, variables, varnames, extras, extranames, :
> > > > invalid variable type
> > > >
> > > > temp <- as.matrix(y1)
> > > > model.frame(~ temp)
> > > > temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > > > 1 3 1 0 1
> > > > 2 0 1 1 0
> > > > 3 0 0 1 0
> > > > 4 0 0 1 1
> > > > 5 0 1 1 1
> > > >
> > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > > > could deal with that later.
> > > >
> > > > I have tracked down the source of the error message to line 1330 in
> > > > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > > > code is looping over the variables in the formula and checking of they
> > > > are the right "type". So a matrix of variables gets through, but a
> > > > data.frame doesn't.
> > > >
> > > > It would be good if model.frame could cope with data.frames in formulae,
> > > > but seeing as I am incapable of providing a patch, is there a way around
> > > > this problem?
> > > >
> > > > Below is the head of the function I am currently using, including the
> > > > function for parsing the formula - borrowed and hacked from
> > > > ordiParseFormula() in package vegan.
> > > >
> > > > I can work out the class of the rhs of the forumla. Is there a way to
> > > > create a suitable environment for the data argument of parseFormula()
> > > > such that it contains the rhs dataframe coerced to a matrix, which then
> > > > should get through model.frame.default without error? How would I go
> > > > about manipulating/creating such an environment? Any other ideas?
> > > >
> > > > Thanks in advance
> > > >
> > > > Gav
> > > >
> > > > coca.formula <- function(formula, method = c("predictive", "symmetric"),
> > > > reg.method = c("simpls", "eigen"), weights = NULL,
> > > > n.axes = NULL, symmetric = FALSE, data)
> > > > {
> > > > parseFormula <- function (formula, data)
> > > > {
> > > > browser()
> > > > Terms <- terms(formula, "Condition", data = data)
> > > > flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
> > > > specdata <- formula[[2]]
> > > > X <- eval(specdata, data, parent.frame())
> > > > X <- as.matrix(X)
> > > > formula[[2]] <- NULL
> > > > if (formula[[2]] == "1" || formula[[2]] == "0")
> > > > Y <- NULL
> > > > else {
> > > > mf <- model.frame(formula, data, na.action = na.fail)
> > > > Y <- model.matrix(formula, mf)
> > > > if (any(colnames(Y) == "(Intercept)")) {
> > > > xint <- which(colnames(Y) == "(Intercept)")
> > > > Y <- Y[, -xint, drop = FALSE]
> > > > }
> > > > }
> > > > list(X = X, Y = Y)
> > > > }
> > > > if (missing(data))
> > > > data <- parent.frame()
> > > > #browser()
> > > > dat <- parseFormula(formula, data)
> > > >
> > > > --
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > > Gavin Simpson [T] +44 (0)20 7679 5522
> > > > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > > > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > > > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > > > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
> > > > London. WC1H 0AP.
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >
> > > > ______________________________________________
> > > > R-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Gavin Simpson [T] +44 (0)20 7679 5522
> > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
> > London. WC1H 0AP.
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> >
> >

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Aug 17 03:48:08 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:16 GMT