Re: [Rd] discrepancy between lm and MASS:rlm

From: William Dunlap <wdunlap_at_tibco.com>
Date: Mon, 14 Mar 2011 11:38:34 -0700

> -----Original Message-----
> From: r-devel-bounces@r-project.org
> [mailto:r-devel-bounces_at_r-project.org] On Behalf Of Vadim Ogranovich
> Sent: Monday, March 14, 2011 10:37 AM
> To: 'r-devel_at_r-project.org'
> Subject: [Rd] discrepancy between lm and MASS:rlm
>
> Dear R-devel,

>
> There seems to be a discrepancy in the order in which lm and
> rlm evaluate their arguments. This causes rlm to sometimes
> produce an error where lm is just fine.

It may not be a problem with the order of evaluation. rlm() might not be calling model.frame() with drop.unused.levels=TRUE. I've made that mistake before with similar symptoms.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> Here is a little script that illustrate the issue:
>
> > library(MASS)
> > ## create data
> > n <- 100
> > dat <- data.frame(x=rep(c(-1,0,1), n), y=rnorm(3*n))
> >
> > ## call lm, works fine
> > summary(lm(y ~ as.factor(x), data=dat, subset=x!=0))
>
> Call:
> lm(formula = y ~ as.factor(x), data = dat, subset = x != 0)
>

> Residuals:
> Min 1Q Median 3Q Max
> -2.60619 -0.82160 0.06307 0.65501 2.56677
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.061010 0.100027 0.610 0.543
> as.factor(x)1 0.001332 0.141459 0.009 0.992
>
> Residual standard error: 1 on 198 degrees of freedom
> Multiple R-squared: 4.479e-07, Adjusted R-squared: -0.00505
> F-statistic: 8.868e-05 on 1 and 198 DF, p-value: 0.9925
>
> > ## call rlm, error
> > summary(rlm(y ~ as.factor(x), data=dat, subset=x!=0))
> Error in rlm.default(x, y, weights, method = method,
> wt.method = wt.method, :
> 'x' is singular: singular fits are not implemented in rlm
> >
>
>
> My guess is that rlm first converts x to a factor, which
> becomes a three-level factor, then subsets on x!=0, which
> effectively eliminates a level, and then creates a
> "regression" matrix, which becomes singular due to the
> absence of data for a level.

>
> Is there a simple way to work around it. The simplest I could
> think of is
> with(subset(dat, x!=0), rlm(y ~ as.factor(x))
> which is ok, but most of my scripts make use of data arg to
> regressions and I'd like to stay consistent as much as practical.

>
> Thanks,
> Vadim
>
>
> Note: This email is for the confidential use of the named
> addressee(s) only and may contain proprietary, confidential
> or privileged information. If you are not the intended
> recipient, you are hereby notified that any review,
> dissemination or copying of this email is strictly
> prohibited, and to please notify the sender immediately and
> destroy this email and any attachments. Email transmission
> cannot be guaranteed to be secure or error-free. Jump
> Trading, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments.
> This email is for informational purposes only and does not
> constitute a recommendation, offer, request or solicitation
> of any kind to buy, sell, subscribe, redeem or perform any
> type of transaction of a financial product.

>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 14 Mar 2011 - 18:45:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 Mar 2011 - 21:30:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive