Re: [R] Fast R implementation of Gini mean difference

About this list Date view Thread view Subject view Author view Attachment view

From: Adelchi Azzalini (azzalini@stat.unipd.it)
Date: Mon 28 Apr 2003 - 18:55:08 EST


Message-id: <20030428085508.88B327CA824@tango.stat.unipd.it>


This is to complement my previous contribution on computation of Gini mean
difference - a discussion started by Andrew Ward. The index is "defined" as
    gini <- 0
      for (i in 1:n)
         {
         for (j in 1:n) gini <- gini + freq[i]*freq[j]*abs(x[i]-x[j])
         }
    gini<- gini/((sum(freq)-1)*sum(freq))

This is the so-called form "without repetition"; the variant "with repetition"
does not have -1 in the final line.

Since computaation via the definition is totally inefficient, alternative
approaches have been put forward, following Andrew's message.

My first version of a computationally convenient implementation was
essentially this:

gini.md0<- function(x)
 { # x=data vector
   n <-length(x)
   return(4*sum((1:length(x))*sort(x)/(n*(n-1)))
       -2*mean(x)*(n+1)/(n-1))
  }

Since Andrew (private message) has stressed the importance in his problem
of allowing for replicated data, here is a more general version, obtained by
elaborating on the previous one with a bit of algebra:

gini.md <- function(x, freq=rep(1,length(x)))
{# x=data vector, freq=vector of frequencies
  if(!is.vector(x)) stop("x must be a vector")
  if(length(x) != length(freq))
       stop("x and freq must have same length")
  if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq)) )
             stop("freq must be counts")
     x <- x[freq>0]
     freq <- freq[freq>0]
     j <- order(x)
     x <- x[j]
     n <- as.integer(freq[j])
     n. <- sum(n)
     u <- (cumsum(n)-n)*n+ n*(n+1)/2
     return(4*sum(u*x)/(n.*(n.-1))
         -2*weighted.mean(x,n)*(n.+1)/(n.-1))
}
  
Notice that gini.md(x,freq) gives the same of mini.md0(rep(x,freq)), but the latter
is obviously less efficient. Either are however far more efficient that straight
implementation of the "definition".

regards

Adelchi Azzalini

-- 
Adelchi Azzalini  <azzalini@stat.unipd.it>
Dipart.Scienze Statistiche, UniversitÓ di Padova, Italia
http://azzalini.stat.unipd.it/

______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Tue 01 Jul 2003 - 09:11:44 EST