Re: [R] robust mlm in R?

From: John Fox <jfox_at_mcmaster.ca>
Date: Tue, 03 Jun 2008 15:43:54 -0400

Dear Michael,

I don't think that anyone else has suggested a fix, so here's one:

# Mahalanobis Dsq for a matrix of variables dsq <- function(x, Sigma) {

   if (missing(Sigma)) Sigma <- cov(x, use="complete.obs")    dev <- scale(x, scale=FALSE)
# DSQ <- dev %*% solve(Sigma) %*% t(dev )

   DSQ <- apply(dev * (dev %*% solve(Sigma)), 1, sum)    return(DSQ)
}

# robust mlm via multivariate trimming a la Gnanadesikan, Kettering & Wilks rmlm.GKW <- function(formula, weights, data, iter=3, pvalue=.01, ...) {

   if (missing(data)) data <- model.frame(formula)    if (missing(weights)) weights <- rep(1, nrow(data))    last.weights <- weights
   for (i in 1:iter) {

     data$weights <- weights
     mod <- lm(formula=formula, data=data, weights=weights, ...)
     res <- residuals(mod)
     coef <- mod$coefficients
     print (coef)
     p <- ncol(res)
     DSQ <- dsq(res)
     prob <- pchisq(DSQ, p, lower.tail=FALSE)
     weights <- ifelse( prob<pvalue, 0, weights)
     nzero <- which(weights==0)
     print (nzero)
     if (isTRUE(all.equal(weights, last.weights))) { break }
   }
   mod
}

There were a scoping problem in the call to lm() and some other errors as well.

I think that this revised function does what you want, but I'd also probably program it differently, handling the standard model arguments in the function and calling lm.wfit() for the computations.

I hope this helps,
 John



John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
On
> Behalf Of Michael Friendly
> Sent: June-02-08 9:51 AM
> To: r-help@stat.math.ethz.ch
> Subject: Re: [R] robust mlm in R?
>
> I had no on-list replies, so I cobbled up a function for the simplest
> method I could think of -- iterative multivariate trimming, following
> Gnanadesikan, Kettering & Wilks, assigning 0 weights to observations
> based on the Mahalanobis D^2 of residuals.
>
> But I'm getting an error I don't understand, and neither traceback()
> nor browser() gives me any insight. Can anyone tell me what is wrong
> with the lm() call in rmlm.GKW below?
>
> > iris.rmod <- rmlm.GKW(cbind(Sepal.Length, Sepal.Width, Petal.Length,
> Petal.Width)~Species, data=iris)
> Error in model.frame.default(formula = formula, data = data, subset =
> subset, :
> invalid type (closure) for variable '(weights)'
> >
>
> Here are the functions:
>
> # Mahalanobis Dsq for a matrix of variables
> dsq <- function(x, Sigma) {
> if (missing(Sigma)) Sigma <- cov(x, use="complete.obs")
> dev <- scale(x, scale=FALSE)
> # DSQ <- dev %*% solve(Sigma) %*% t(dev )
> DSQ <- apply(dev * (dev %*% solve(Sigma)), 1, sum)
> return(DSQ)
> }
>
> # robust mlm via multivariate trimming a la Gnanadesikan, Kettering &
Wilks
> rmlm.GKW <- function(formula, data, subset, weights=NULL, iter=3,
> pvalue=.01) {
> if (missing(weights) | is.null(weights)) { weights <- rep(1,
> nrow(data)) }
> last.weights <- weights
> for (i in 1:iter) {
> mod <- lm(formula=formula, data=data, subset=subset, weights=weights)
> res <- residuals(mod)
> coef <- mod$coefficients
> print (coef)
> p <- ncol(res)
> DSQ <- dsq(res)
> prob <- pchisq(DSQ, p, lower.tail=FALSE)
> weights <- ifelse( prob<pvalue, 0, weights)
> nzero <- which(weights=0)
> print (nzero)
> if (all.equal(weights, last.weights)) { break }
> }
> }
>
> Michael Friendly wrote:
> > I'm looking for something in R to fit a multivariate linear model
> > robustly, using
> > an M-estimator or any of the myriad of other robust methods for linear
> > models
> > implemented in robustbase or methods based on MCD or MVE covariance
> > estimation (package rrcov).
> >
> > E.g., one can fit an mlm for the iris data as:
> > iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length,
> > Petal.Width) ~ Species, data=iris)
> >
> > What I'd like is something like rlm() in MASS, but handling an mlm,
e.g.,
> > iris.mod <- rmlm(cbind(Sepal.Length, Sepal.Width, Petal.Length,
> > Petal.Width) ~ Species, data=iris)
> > and returning a vector of observation weights in its result.
> >
> > There's a burgeoning literature on this topic, but I haven't yet found
> > computational methods.
> > Any pointers or suggestions would be appreciated.
> >
> > -Michael
> >
> >
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 03 Jun 2008 - 20:57:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 03 Jun 2008 - 22:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive