RE: [R] Fast R implementation of Gini mean difference

About this list Date view Thread view Subject view Author view Attachment view

From: Andrew C. Ward (andreww@cheque.uq.edu.au)
Date: Tue 29 Apr 2003 - 06:02:04 EST


Message-id: <01C30DC1.0A2AC450.andreww@cheque.uq.edu.au>

Thank you again to Professor Azzalini for taking the time to
expound this issue for me. It's been very instructive to look
in detail at the definition of Gini mean difference and at the
incorporation of weights. Noting suggestions for improving the
speed of my function has also been very helpful.

With my application, the weights refer to the "reliability" of
a measurement, with a weight of 1 signifying high reliability
and a weight close to zero indicating low reliability. The
mean difference is used as a robust estimate of the standard
deviation of the vector of measurements (the 0.5*sqrt(pi)
multiplier is for this purpose).

It seems to me that Professor Azzalini's result may be used for
non-integer weights if the weights are scaled to sum to the
number of measurements (ie. w <- w * length(x)/sum(w)). In this
case, I think that gmd(x=c(1,2,4), w=c(1,2,1)) gives the same
result at gmd(x=c(1,2,2,4), w=c(1,1,1,1)).

Thank you again to all who have contributed to my understanding
and R implementation of the mean difference.

Regards,

Andrew C. Ward

CAPE Centre
Department of Chemical Engineering
The University of Queensland
Brisbane Qld 4072 Australia
andreww@cheque.uq.edu.au

On Monday, April 28, 2003 6:55 PM, Adelchi Azzalini
[SMTP:azzalini@stat.unipd.it] wrote:
>
> This is to complement my previous contribution on computation
of
> Gini mean
> difference - a discussion started by Andrew Ward. The index is
> "defined" as
> gini <- 0
> for (i in 1:n)
> {
> for (j in 1:n) gini <- gini + freq[i]*freq[j]*abs(x
> [i]-x[j])
> }
> gini<- gini/((sum(freq)-1)*sum(freq))
>
> This is the so-called form "without repetition"; the variant
> "with repetition"
> does not have -1 in the final line.
>
> Since computaation via the definition is totally inefficient,
> alternative
> approaches have been put forward, following Andrew's message.
>
> My first version of a computationally convenient implementation
> was
> essentially this:
>
> gini.md0<- function(x)
> { # x=data vector
> n <-length(x)
> return(4*sum((1:length(x))*sort(x)/(n*(n-1)))
> -2*mean(x)*(n+1)/(n-1))
> }
>
>
> Since Andrew (private message) has stressed the importance in
> his problem
> of allowing for replicated data, here is a more general
version,
> obtained by
> elaborating on the previous one with a bit of algebra:
>
> gini.md <- function(x, freq=rep(1,length(x)))
> {# x=data vector, freq=vector of frequencies
> if(!is.vector(x)) stop("x must be a vector")
> if(length(x) != length(freq))
> stop("x and freq must have same length")
> if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq))
> )
> stop("freq must be counts")
> x <- x[freq>0]
> freq <- freq[freq>0]
> j <- order(x)
> x <- x[j]
> n <- as.integer(freq[j])
> n. <- sum(n)
> u <- (cumsum(n)-n)*n+ n*(n+1)/2
> return(4*sum(u*x)/(n.*(n.-1))
> -2*weighted.mean(x,n)*(n.+1)/(n.-1))
> }
>
> Notice that gini.md(x,freq) gives the same of mini.md0(rep
> (x,freq)), but the latter
> is obviously less efficient. Either are however far more
> efficient that straight
> implementation of the "definition".
>
> regards
>
> Adelchi Azzalini
>
> --
> Adelchi Azzalini <azzalini@stat.unipd.it>
> Dipart.Scienze Statistiche, Universita di Padova, Italia
> http://azzalini.stat.unipd.it/
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Tue 01 Jul 2003 - 09:11:44 EST