**From:** Adelchi Azzalini (*azzalini@stat.unipd.it*)

**Date:** Mon 28 Apr 2003 - 18:55:08 EST

**Next message:**Prof Brian Ripley: "Re: [R] sum(..., na.rm=TRUE) oddity"**Previous message:**Khamenia, Valery: "AW: AW: [R] numericDeriv and ecdf"**In reply to:**Andrew C. Ward: "[R] Fast R implementation of Gini mean difference"**Next in thread:**Andrew C. Ward: "RE: [R] Fast R implementation of Gini mean difference"

Message-id: <20030428085508.88B327CA824@tango.stat.unipd.it>

This is to complement my previous contribution on computation of Gini mean

difference - a discussion started by Andrew Ward. The index is "defined" as

gini <- 0

for (i in 1:n)

{

for (j in 1:n) gini <- gini + freq[i]*freq[j]*abs(x[i]-x[j])

}

gini<- gini/((sum(freq)-1)*sum(freq))

This is the so-called form "without repetition"; the variant "with repetition"

does not have -1 in the final line.

Since computaation via the definition is totally inefficient, alternative

approaches have been put forward, following Andrew's message.

My first version of a computationally convenient implementation was

essentially this:

gini.md0<- function(x)

{ # x=data vector

n <-length(x)

return(4*sum((1:length(x))*sort(x)/(n*(n-1)))

-2*mean(x)*(n+1)/(n-1))

}

Since Andrew (private message) has stressed the importance in his problem

of allowing for replicated data, here is a more general version, obtained by

elaborating on the previous one with a bit of algebra:

gini.md <- function(x, freq=rep(1,length(x)))

{# x=data vector, freq=vector of frequencies

if(!is.vector(x)) stop("x must be a vector")

if(length(x) != length(freq))

stop("x and freq must have same length")

if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq)) )

stop("freq must be counts")

x <- x[freq>0]

freq <- freq[freq>0]

j <- order(x)

x <- x[j]

n <- as.integer(freq[j])

n. <- sum(n)

u <- (cumsum(n)-n)*n+ n*(n+1)/2

return(4*sum(u*x)/(n.*(n.-1))

-2*weighted.mean(x,n)*(n.+1)/(n.-1))

}

Notice that gini.md(x,freq) gives the same of mini.md0(rep(x,freq)), but the latter

is obviously less efficient. Either are however far more efficient that straight

implementation of the "definition".

regards

Adelchi Azzalini

-- Adelchi Azzalini <azzalini@stat.unipd.it> Dipart.Scienze Statistiche, Università di Padova, Italia http://azzalini.stat.unipd.it/______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help

**Next message:**Prof Brian Ripley: "Re: [R] sum(..., na.rm=TRUE) oddity"**Previous message:**Khamenia, Valery: "AW: AW: [R] numericDeriv and ecdf"**In reply to:**Andrew C. Ward: "[R] Fast R implementation of Gini mean difference"**Next in thread:**Andrew C. Ward: "RE: [R] Fast R implementation of Gini mean difference"

*
This archive was generated by hypermail 2.1.3
: Tue 01 Jul 2003 - 09:11:44 EST
*