Re: [Rd] discrepancy between lm and MASS:rlm

From: Vadim Ogranovich <vogranovich_at_jumptrading.com>
Date: Mon, 14 Mar 2011 14:31:22 -0500

Indeed! I added the drop.unused.levels argument and it now works. Thank you Bill!

Here is a patch which one should use at one's own risk as it redefines rlm.formula as global object, i.e. rlm.formula is no longer in the MASS namespace and this is certainly not a good idea:

rlm.formula <- function (formula, data, weights, ..., subset, na.action, method = c("M",

    "MM", "model.frame"), wt.method = c("inv.var", "case"), model = TRUE,     x.ret = TRUE, y.ret = FALSE, contrasts = NULL) {

    mf <- match.call(expand.dots = FALSE)     mf$method <- mf$wt.method <- mf$model <- mf$x.ret <- mf$y.ret <- mf$contrasts <- mf$... <- NULL     mf$drop.unused.levels <- TRUE
    mf[[1L]] <- as.name("model.frame")
    mf <- eval.parent(mf)
    method <- match.arg(method)
    wt.method <- match.arg(wt.method)
    if (method == "model.frame")

        return(mf)
    mt <- attr(mf, "terms")
    y <- model.response(mf)
    offset <- model.offset(mf)
    if (!is.null(offset))

        y <- y - offset
    x <- model.matrix(mt, mf, contrasts)     xvars <- as.character(attr(mt, "variables"))[-1L]     if ((yvar <- attr(mt, "response")) > 0L)

        xvars <- xvars[-yvar]
    xlev <- if (length(xvars) > 0L) {

        xlev <- lapply(mf[xvars], levels)
        xlev[!sapply(xlev, is.null)]

    }
    weights <- model.weights(mf)
    if (!length(weights))

        weights <- rep(1, nrow(x))
    fit <- MASS:::rlm.default(x, y, weights, method = method, wt.method = wt.method,

        ...)
    fit$terms <- mt
    cl <- match.call()
    cl[[1L]] <- as.name("rlm")

    fit$call <- cl
    fit$contrasts <- attr(x, "contrasts")
    fit$xlevels <- .getXlevels(mt, mf)
    fit$na.action <- attr(mf, "na.action")
    if (model)
        fit$model <- mf
    if (!x.ret)
        fit$x <- NULL
    if (y.ret)
        fit$y <- y

    fit
}

-----Original Message-----
From: William Dunlap [mailto:wdunlap_at_tibco.com] Sent: Monday, March 14, 2011 1:39 PM
To: Vadim Ogranovich; r-devel_at_r-project.org Subject: RE: [Rd] discrepancy between lm and MASS:rlm

> -----Original Message-----
> From: r-devel-bounces_at_r-project.org
> [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Vadim Ogranovich
> Sent: Monday, March 14, 2011 10:37 AM
> To: 'r-devel_at_r-project.org'
> Subject: [Rd] discrepancy between lm and MASS:rlm

>

> Dear R-devel,
>

> There seems to be a discrepancy in the order in which lm and
> rlm evaluate their arguments. This causes rlm to sometimes
> produce an error where lm is just fine.

It may not be a problem with the order of evaluation. rlm() might not be calling model.frame() with drop.unused.levels=TRUE. I've made that mistake before with similar symptoms.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>

> Here is a little script that illustrate the issue:
>

> > library(MASS)
> > ## create data
> > n <- 100
> > dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
> >
> > ## call lm, works fine
> > summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
>

> Call:
> lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
>

> Residuals:
> Min 1Q Median 3Q Max
> -2.60619 -0.82160 0.06307 0.65501 2.56677
>

> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.061010 0.100027 0.610 0.543
> as.factor(x)1 0.001332 0.141459 0.009 0.992
>

> Residual standard error: 1 on 198 degrees of freedom
> Multiple R-squared: 4.479e-07, Adjusted R-squared: -0.00505
> F-statistic: 8.868e-05 on 1 and 198 DF, p-value: 0.9925
>

> > ## call rlm, error
> > summary(rlm(y ~ as.factor(x), data=dat, subset=x!=0))
> Error in rlm.default(x, y, weights, method = method,
> wt.method = wt.method, :
> 'x' is singular: singular fits are not implemented in rlm
> >
>
>

> My guess is that rlm first converts x to a factor, which
> becomes a three-level factor, then subsets on x!=0, which
> effectively eliminates a level, and then creates a
> "regression" matrix, which becomes singular due to the
> absence of data for a level.
>

> Is there a simple way to work around it. The simplest I could
> think of is
> with(subset(dat, x!=0), rlm(y ~ as.factor(x))
> which is ok, but most of my scripts make use of data arg to
> regressions and I'd like to stay consistent as much as practical.
>

> Thanks,
> Vadim
>
>

> Note: This email is for the confidential use of the named
> addressee(s) only and may contain proprietary, confidential
> or privileged information. If you are not the intended
> recipient, you are hereby notified that any review,
> dissemination or copying of this email is strictly
> prohibited, and to please notify the sender immediately and
> destroy this email and any attachments. Email transmission
> cannot be guaranteed to be secure or error-free. Jump
> Trading, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments.
> This email is for informational purposes only and does not
> constitute a recommendation, offer, request or solicitation
> of any kind to buy, sell, subscribe, redeem or perform any
> type of transaction of a financial product.
>

> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 14 Mar 2011 - 19:36:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 Mar 2011 - 22:10:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive