Re: [R] skewness and kurtosis in e1071 correct?

From: Campbell <p.campbell_at_econ.bbk.ac.uk>
Date: Mon 23 May 2005 - 19:06:31 EST


This is probably an issue over definitions rather than the correct answer. To me skewness and kurtosis are functions of the distribution rather than the population, they are equivalent to expectation rather than mean. For the normal distribution it makes no sense to estimate them as the distribution is uniquely defined by its first two moments.  However there are two defnitions of kurotsis as it is often standardized such that the expectation is 0.

HTH Phineas

www.pwp.phineas.blueyonder.co.uk

>>> Dirk Enzmann <dirk.enzmann@jura.uni-hamburg.de> 05/23/05 3:17 AM >>> I wonder whether the functions for skewness and kurtosis in the e1071 package are based on correct formulas.

The functions in the package e1071 are:

# --------------------------------------------
skewness <- function (x, na.rm = FALSE)
{

     if (na.rm)
         x <- x[!is.na(x)]
     sum((x - mean(x))^3)/(length(x) * sd(x)^3)
}
# --------------------------------------------

and

# --------------------------------------------
kurtosis <- function (x, na.rm = FALSE)
{

     if (na.rm)
         x <- x[!is.na(x)]
     sum((x - mean(x))^4)/(length(x) * var(x)^2) - 3
}
# --------------------------------------------

However, sd() and var() are the estimated population parameters. To calculate the sample statistics of skewness and kurtosis, shouldn't one use the sample statistics of the standard deviation (and variance), as well? For example:

# --------------------------------------------
# Function to calculate the sample statistic of skewness:
skew_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 3)
  {
    cat('valid cases = ',n,'\nskewness is not defined for less than 3 valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^3)
  }
}
# --------------------------------------------

and

# --------------------------------------------
# Function to calculate the sample statistic of kurtosis:
kurt_s=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 4)
  {
    cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 valid cases!\n')
  }
  else
  {
  z = sqrt(n/(n-1))*scale(x)
  mean(z^4)-3
  }
}
# --------------------------------------------

Whereas, to calculate the (unbiased) estimated population parameter of skewness and kurtosis, the correction should also include the number of cases in the following way:

# --------------------------------------------
# Function to calculate the unbiased populataion estimate of skewness:
skew=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 3)
  {
    cat('valid cases = ',n,'\nskewness is not defined for less than 3 valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^3)*n/((n-1)*(n-2))
  }
}
# --------------------------------------------

and

# --------------------------------------------
# Function to calculate the unbiased population estimate of kurtosis:
kurt=function(x)
{
  x = x[!is.na(x)]
  n = length(x)
  if (n < 4)
  {
    cat('valid cases = ',n,'\nkurtosis is not defined for less than 4 valid cases!\n')
  }
  else
  {
  z = scale(x)
  sum(z^4)*n*(n+1)/((n-1)*(n-2)*(n-3))-3*(n-1)^2/((n-2)*(n-3))   }
}
# --------------------------------------------

Thus, it seems that the formulas used in the e1071 package are neither formulas for the sample statistics nor for the (unbiased) estimates of the population parameters. Is there another definition of kurtosis and skewness that I am not aware of?

Dirk

-- 
*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
        +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: dirk.enzmann@jura.uni-hamburg.de
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon May 23 19:12:05 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:00 EST