# Re: [R] Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sat 20 Jan 2007 - 07:08:49 GMT

The usual advice would seem to apply here:

• profile (the real application, not a toy example) to see where the bottlenecks are.
• rewrite those in C.

It is quite possible that hubers would benefit in your problem by being rewritten in C. However, there is a reason why it was not. It is a univariate location estimator, and using multiple univariate location estimators is not a robust multivariate location estimator. So in so far as I understand your problem sketch, you would be better off with a multivariate estimator for your N outcomes.

There may be problems which need the application of very large numbers of univariate robust estimators, but they are not commonplace.

On Fri, 19 Jan 2007, Benilton Carvalho wrote:

> Hi Everyone,
>
> Given the scenario I have, I was wondering if anyone would be able to
> give me a hind on how to get the results from hubers() in a more
> efficient way.
>
> I have an outcome on an array [N x S x D].
>
> I also have a factor (levels 1,2,3) stored on a matrix N x S.
>
> My objective is to get "mu" and "sigma" for each of the N rows
> (outcome) stratified by the factor (levels 1, 2 and 3) for each of
> the D "levels", but using MASS:hubers().
>
> Ideally the final result would be an array [N x D x 3 x 2].
>
> The following toy example demonstrates what I want to do, and I'd
> like to improve the performance when working on my case, where S=400
> and N > 200000
>
> Thank you very much for any suggestion.
>
> benilton
>
> ## begin toy example
> set.seed(1)
> N <- 100
> S <- 5
> D <- 2
>
> outcome <- array(rnorm(N*S*D), dim=c(N, S, D))
> classes <- matrix(sample(c(1:3, NA), N*S, rep=T), ncol=S)
>
> results <- array(NA, dim=c(N, D, 3, 2))
>
> library(MASS)
> myHubers <- function(x)
> if (length(x)>1) as.numeric(hubers(x)) else c(NA, NA)
>
> for (n in 1:N)
> for (d in 1:D){
> tmp <- outcome[n,,d]
> grp <- classes[n,]
> results[n, d,,] <- t(sapply(split(tmp, factor(grp, levels=1:3)),
> myHubers))
> }
> ## end
>
> --
> Benilton Carvalho
> PhD Candidate
> Department of Biostatistics
> Johns Hopkins University
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help