Re: [Rd] problem using model.frame()

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Thu 18 Aug 2005 - 06:53:05 GMT

On Wed, 2005-08-17 at 21:48 -0400, Gabor Grothendieck wrote:
> If its just a matter of specifying two data frames how about just
> letting the user specify them as the first two arguments without
> injecting formulas into it so that any of these are allowed but
> data frames are still not allowed in formulas other than in the
> data argument:
>
> yourfunction(df1, df2)
> yourfunction(y ~ sp1 + sp2)
> yourfunction(y ~., df)
>
> This could easily be implemented by having yourfunction be
> generic in which case the first one would dispatch
> yourfunction.data.frame and the second and third would
> dispatch yourfunction.formula .

Hi Gabor,

yourfunction() is already generic, I have .default and .formula methods. The default implementation of the method (Co-correspondence analysis) is akin to a regression and uses a form of multivariate PLS. So one data matrix plays the role of the response and one the predictor. Which is the reason for wanting to use a formula interface.

Cheers,

G

> On 8/17/05, Gavin Simpson <gavin.simpson@ucl.ac.uk> wrote:
> > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> > > >>>>> "GS" == Gavin Simpson <gavin.simpson@ucl.ac.uk>
> > > >>>>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> > >
> > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> > > GS> wrote:
> > > >> On 8/16/05, Gavin Simpson <gavin.simpson@ucl.ac.uk>
> > > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
> > > >> Grothendieck wrote: > > It can handle data frames like
> > > >> this:
> > > >> > >
> > > >> > > model.frame(y1) > > or > > model.frame(~., y1)
> > > >> >
> > > >> > Thanks Gabor,
> > > >> >
> > > >> > Yes, I know that works, but I want the function
> > > >> coca.formula to accept a > formula like this y2 ~ y1,
> > > >> with both y1 and y2 being data frames. It is
> > > >>
> > > >> The expressions I gave work generally (i.e. lm, glm,
> > > >> ...), not just in model.matrix, so would it be ok if the
> > > >> user just does this?
> > > >>
> > > >> yourfunction(y2 ~., y1)
> > >
> > > GS> Thanks again Gabor for your comments,
> > >
> > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> > > GS> most natural way of doing things. I'd like to have (y2
> > > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> > > GS> work - silently without any trouble.
> > >
> > > I'm sorry, Gavin, I tend to disagree quite a bit.
> > >
> > > The formula notation has quite a history in the S language, and
> > > AFAIK never was the idea to use data.frames as formula
> > > components, but rather as "environments" in which formula
> > > components are looked up --- exactly as Gabor has explained.
> >
> > Hi Martin, thanks for your comments,
> >
> > But then one could have a matrix of variables on the rhs of the formula
> > and it would work - whether this is a documented feature or un-intended
> > side-effect of matrices being stored as vectors with dims, I don't know.
> >
> > And whilst the formula may have a long history, a number of packages
> > have extended the interface to implement a specific feature, which don't
> > work with standard functions like lm, glm and friends. I don't see how
> > what I wanted to achieve is greatly different to that or using a matrix.
> >
> > > To break with such a deeply rooted principle,
> > > you should have very very good reasons, because you're breaking
> > > the concepts on which all other uses of formulae are based.
> > > And this would potentially lead to much confusion of your users,
> > > at least in the way they should learn to think about what
> > > formulae mean.
> >
> > In the end I managed to treat y1 ~ y2 (both data frames) as a special
> > case, which allows the existing formula notation to work as well, so I
> > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> > is what I wanted all along, to extend my interface (not do anything to
> > R's formulae), but to also work in the traditional sense.
> >
> > The model I am writing code for really is modelling the relationship
> > between two matrices of data. In one version of the method, there is
> > real equivalence between both sides of the formula so it would seem odd
> > to treat the two sides of the formula differently. At least to me ;-)
> >
> > > Martin
> > >
> > >
> > > >> If it really is important to do it the way you describe,
> > > >> are the data frames necessarily numeric? If so you could
> > > >> preprocess your formula by placing as.matrix around all
> > > >> the variables representing data frames using something
> > > >> like this:
> > > >>
> > > >> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
> > >
> > > GS> Yes, they are numeric matrices (as data frames). I've
> > > GS> looked at this, but I'd prefer to not have to do too
> > > GS> much messing with the formula.
> > >
> > > >> Of course, if they are necessarily numeric maybe they can
> > > >> be matrices in the first place?
> > >
> > > GS> Because read.table etc. produce data.frames and this is
> > > GS> the natural way to work with data in R.
> > >
> > > but it is also slightly inefficient if they are numeric.
> > > There are places for data frames and for matrices.
> >
> > I agree - and in the code I've written, y1 and y2 quickly get coerced to
> > matrices before the real number crunching begins.
> >
> > However, all the other R modelling functions I have used work with
> > data.frames. Arguably, it could cause more confusion to write a function
> > that looked, walked and quacked like an R modelling function but needed
> > the user to apply an extra step to use - a step not usually required
> > under normal R usage.
> >
> > All the best,
> >
> > Gav
> >
> > > Why should it be a problem to use
> > > M <- as.matrix(read.table(..))
> > > ?
> > >
> > > For large files, it could be quite a bit more efficient,
> > > needing a bit more of code, to
> > > use scan() to read the numeric data directly :
> > >
> > > h1 <- scan(..., n=1) ## <read variable names>
> > > nc <- length(h1)
> > > a <- matrix(scan(...., what = numeric(), ...),
> > > ncol = nc, dimnames = list(NULL, h1))
> > >
> > > maybe this would be useful to be packaged into
> > > a small utility with usage
> > >
> > > read.matrix(..., type = numeric(), ...)
> > >
> > >
> > > GS> Following your suggestions, I altered my code to
> > > GS> evaluate the rhs of the formula and check if it was of
> > > GS> class "data.frame". If it is then I stop processing and
> > > GS> return it as a data.frame as this point. If not, it
> > > GS> eventually gets passed on to model.frame() for it to
> > > GS> deal with it.
> > >
> > > GS> So far - limited testing - it seems to do what I wanted
> > > GS> all along. I'm sure there's a gotcha in there somewhere
> > > GS> but at least the code runs so I can check for problems
> > > GS> against my examples.
> > >
> > > GS> Right, back to writing documentation...
> > >
> > > GS> G
> > >
> > > >> > more intuitive, to my mind at least for this particular
> > > >> example and > analysis, to specify the formula with a
> > > >> data frame on the rhs.
> > > >> >
> > > >> > model.frame doesn't work with the formula "~ y1" if the
> > > >> object y1, in > the environment when model.frame
> > > >> evaluates the formula, is a data.frame. > It works if y1
> > > >> is a matrix, however. I'd like to work around this >
> > > >> problem, say by creating an environment in which y1 is
> > > >> modified to be a > matrix, if possible. Can this be done?
> > > >> >
> > > >> > At the moment I have something working by grabbing the
> > > >> bits of the > formula and then using get() to grab the
> > > >> named object. Of course, this > won't work if someone
> > > >> wants to use R's formula interface with the > following
> > > >> formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> > > >> > subset argument common to many formula
> > > >> implementations. I'd like to have > the function work in
> > > >> as general a manner as possible, so I'm fishing > around
> > > >> for potential solutions.
> > > >> >
> > > >> > All the best,
> > > >> >
> > > >> > Gav
> > > >> >
> > > >> > >
> > > >> > > On 8/16/05, Gavin Simpson <gavin.simpson@ucl.ac.uk>
> > > >> wrote: > > > Hi I'm having a problem with model.frame,
> > > >> encapsulated in this example:
> > > >> > > >
> > > >> > > > y1 <-
> > > >> matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > >
> > > >> nrow = 5, byrow = TRUE) > > > y1 <- as.data.frame(y1) > >
> > > >> > rownames(y1) <- paste("site", 1:5, sep = "") > > >
> > > >> colnames(y1) <- paste("spp", 1:4, sep = "") > > > y1
> > > >> > > >
> > > >> > > > model.frame(~ y1) > > > Error in
> > > >> model.frame(formula, rownames, variables, varnames,
> > > >> extras, extranames, : > > > invalid variable type
> > > >> > > >
> > > >> > > > temp <- as.matrix(y1) > > > model.frame(~ temp) > >
> > > >> > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > > 1 3 1 0 1
> > > >> > > > 2 0 1 1 0 > > > 3 0 0 1 0 > > > 4 0 0 1 1 > > > 5 0
> > > >> 1 1 1
> > > >> > > >
> > > >> > > > Ideally the above wouldn't have names like
> > > >> temp.var1, temp.var2, but one > > > could deal with that
> > > >> later.
> > > >> > > >
> > > >> > > > I have tracked down the source of the error message
> > > >> to line 1330 in > > > model.c - here I'm stumped as I
> > > >> don't know any C, but it looks as if the > > > code is
> > > >> looping over the variables in the formula and checking of
> > > >> they > > > are the right "type". So a matrix of variables
> > > >> gets through, but a > > > data.frame doesn't.
> > > >> > > >
> > > >> > > > It would be good if model.frame could cope with
> > > >> data.frames in formulae, > > > but seeing as I am
> > > >> incapable of providing a patch, is there a way around > >
> > > >> > this problem?
> > > >> > > >
> > > >> > > > Below is the head of the function I am currently
> > > >> using, including the > > > function for parsing the
> > > >> formula - borrowed and hacked from > > >
> > > >> ordiParseFormula() in package vegan.
> > > >> > > >
> > > >> > > > I can work out the class of the rhs of the
> > > >> forumla. Is there a way to > > > create a suitable
> > > >> environment for the data argument of parseFormula() > > >
> > > >> such that it contains the rhs dataframe coerced to a
> > > >> matrix, which then > > > should get through
> > > >> model.frame.default without error? How would I go > > >
> > > >> about manipulating/creating such an environment? Any
> > > >> other ideas?
> > > >> > > >
> > > >> > > > Thanks in advance
> > > >> > > >
> > > >> > > > Gav
> > > >> > > >
> > > >> > > > coca.formula <- function(formula, method =
> > > >> c("predictive", "symmetric"), > > > reg.method =
> > > >> c("simpls", "eigen"), weights = NULL, > > > n.axes =
> > > >> NULL, symmetric = FALSE, data) > > > { > > > parseFormula
> > > >> <- function (formula, data) > > > { > > > browser() > > >
> > > >> Terms <- terms(formula, "Condition", data = data) > > >
> > > >> flapart <- fla <- formula <- formula(Terms, width.cutoff
> > > >> = 500) > > > specdata <- formula[[2]] > > > X <-
> > > >> eval(specdata, data, parent.frame()) > > > X <-
> > > >> as.matrix(X) > > > formula[[2]] <- NULL > > > if
> > > >> (formula[[2]] == "1" || formula[[2]] == "0") > > > Y <-
> > > >> NULL > > > else { > > > mf <- model.frame(formula, data,
> > > >> na.action = na.fail) > > > Y <- model.matrix(formula, mf)
> > > >> > > > if (any(colnames(Y) == "(Intercept)")) { > > > xint
> > > >> <- which(colnames(Y) == "(Intercept)") > > > Y <- Y[,
> > > >> -xint, drop = FALSE] > > > } > > > } > > > list(X = X, Y
> > > >> = Y) > > > } > > > if (missing(data)) > > > data <-
> > > >> parent.frame() > > > #browser() > > > dat <-
> > > >> parseFormula(formula, data)
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >> > > > Gavin Simpson [T] +44 (0)20 7679 5522 > > > ENSIS
> > > >> Research Fellow [F] +44 (0)20 7679 7565 > > > ENSIS
> > > >> Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > > UCL
> > > >> Department of Geography [W]
> > > >> http://www.ucl.ac.uk/~ucfagls/cv/ > > > 26 Bedford Way
> > > >> [W] http://www.ucl.ac.uk/~ucfagls/ > > > London. WC1H
> > > >> 0AP. > > >
> > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >> > > >
> > > >> > > > ______________________________________________ > >
> > > >> > R-devel@r-project.org mailing list > > >
> > > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >> > > >
> > > >> > --
> > > >> >
> > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >> > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research
> > > >> Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E]
> > > >> gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of
> > > >> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > 26
> > > >> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > London.
> > > >> WC1H 0AP. >
> > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >> >
> > > >> >
> > > >> >
> > > GS> --
> > > GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > GS> Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research
> > > GS> Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E]
> > > GS> gavin.simpsonATNOSPAMucl.ac.uk UCL Department of
> > > GS> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26
> > > GS> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London.
> > > GS> WC1H 0AP.
> > > GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > >
> > > GS> ______________________________________________
> > > GS> R-devel@r-project.org mailing list
> > > GS> https://stat.ethz.ch/mailman/listinfo/r-devel
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Gavin Simpson [T] +44 (0)20 7679 5522
> > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
> > London. WC1H 0AP.
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> >

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu Aug 18 16:57:20 2005

This archive was generated by hypermail 2.1.8 : Mon 24 Oct 2005 - 22:27:39 GMT