**From:** Andrew C. Ward (*andreww@cheque.uq.edu.au*)

**Date:** Tue 29 Apr 2003 - 06:02:04 EST

**Next message:**vincent.stoliaroff@socgen.com: "[R] Writing a function"**Previous message:**Prof Brian Ripley: "Re: [R] sum(..., na.rm=TRUE) oddity"**Maybe in reply to:**Andrew C. Ward: "[R] Fast R implementation of Gini mean difference"

Message-id: <01C30DC1.0A2AC450.andreww@cheque.uq.edu.au>

Thank you again to Professor Azzalini for taking the time to

expound this issue for me. It's been very instructive to look

in detail at the definition of Gini mean difference and at the

incorporation of weights. Noting suggestions for improving the

speed of my function has also been very helpful.

With my application, the weights refer to the "reliability" of

a measurement, with a weight of 1 signifying high reliability

and a weight close to zero indicating low reliability. The

mean difference is used as a robust estimate of the standard

deviation of the vector of measurements (the 0.5*sqrt(pi)

multiplier is for this purpose).

It seems to me that Professor Azzalini's result may be used for

non-integer weights if the weights are scaled to sum to the

number of measurements (ie. w <- w * length(x)/sum(w)). In this

case, I think that gmd(x=c(1,2,4), w=c(1,2,1)) gives the same

result at gmd(x=c(1,2,2,4), w=c(1,1,1,1)).

Thank you again to all who have contributed to my understanding

and R implementation of the mean difference.

Regards,

Andrew C. Ward

CAPE Centre

Department of Chemical Engineering

The University of Queensland

Brisbane Qld 4072 Australia

andreww@cheque.uq.edu.au

On Monday, April 28, 2003 6:55 PM, Adelchi Azzalini

[SMTP:azzalini@stat.unipd.it] wrote:

*>
*

*> This is to complement my previous contribution on computation
*

of

*> Gini mean
*

*> difference - a discussion started by Andrew Ward. The index is
*

*> "defined" as
*

*> gini <- 0
*

*> for (i in 1:n)
*

*> {
*

*> for (j in 1:n) gini <- gini + freq[i]*freq[j]*abs(x
*

*> [i]-x[j])
*

*> }
*

*> gini<- gini/((sum(freq)-1)*sum(freq))
*

*>
*

*> This is the so-called form "without repetition"; the variant
*

*> "with repetition"
*

*> does not have -1 in the final line.
*

*>
*

*> Since computaation via the definition is totally inefficient,
*

*> alternative
*

*> approaches have been put forward, following Andrew's message.
*

*>
*

*> My first version of a computationally convenient implementation
*

*> was
*

*> essentially this:
*

*>
*

*> gini.md0<- function(x)
*

*> { # x=data vector
*

*> n <-length(x)
*

*> return(4*sum((1:length(x))*sort(x)/(n*(n-1)))
*

*> -2*mean(x)*(n+1)/(n-1))
*

*> }
*

*>
*

*>
*

*> Since Andrew (private message) has stressed the importance in
*

*> his problem
*

*> of allowing for replicated data, here is a more general
*

version,

*> obtained by
*

*> elaborating on the previous one with a bit of algebra:
*

*>
*

*> gini.md <- function(x, freq=rep(1,length(x)))
*

*> {# x=data vector, freq=vector of frequencies
*

*> if(!is.vector(x)) stop("x must be a vector")
*

*> if(length(x) != length(freq))
*

*> stop("x and freq must have same length")
*

*> if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq))
*

*> )
*

*> stop("freq must be counts")
*

*> x <- x[freq>0]
*

*> freq <- freq[freq>0]
*

*> j <- order(x)
*

*> x <- x[j]
*

*> n <- as.integer(freq[j])
*

*> n. <- sum(n)
*

*> u <- (cumsum(n)-n)*n+ n*(n+1)/2
*

*> return(4*sum(u*x)/(n.*(n.-1))
*

*> -2*weighted.mean(x,n)*(n.+1)/(n.-1))
*

*> }
*

*>
*

*> Notice that gini.md(x,freq) gives the same of mini.md0(rep
*

*> (x,freq)), but the latter
*

*> is obviously less efficient. Either are however far more
*

*> efficient that straight
*

*> implementation of the "definition".
*

*>
*

*> regards
*

*>
*

*> Adelchi Azzalini
*

*>
*

*> --
*

*> Adelchi Azzalini <azzalini@stat.unipd.it>
*

*> Dipart.Scienze Statistiche, Universita di Padova, Italia
*

*> http://azzalini.stat.unipd.it/
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
*

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help

**Next message:**vincent.stoliaroff@socgen.com: "[R] Writing a function"**Previous message:**Prof Brian Ripley: "Re: [R] sum(..., na.rm=TRUE) oddity"**Maybe in reply to:**Andrew C. Ward: "[R] Fast R implementation of Gini mean difference"

*
This archive was generated by hypermail 2.1.3
: Tue 01 Jul 2003 - 09:11:44 EST
*