Re: [R] skewness and kurtosis in e1071 correct?

From: Dirk Enzmann <dirk.enzmann_at_jura.uni-hamburg.de>
Date: Tue 24 May 2005 - 22:27:57 EST

To me your answer is opaque (but that seems to be rather a problem of language ;-) ). Perhaps my question has not been expressed clearly enough. Let me state it differently:

In the R package e1071 the formulas (implicit) used are (3) and (4) (see below), the standard deviation used in these formulas, however is based on (2) (see below). This seems to be inconsistent and my question is, whether there is a commonly used third definition of skewness and kurtosis in which the formulas for the "biased" skewness and kurtosis _but_ with the "unbiased" standard deviation are employed.

The standard deviation can be defined as the _sample_ statistic:

sd = 1/n * sum( (x - mean(x))^2 ) # (1)

and as the estimated population parameter:

sd = 1/(n-1) * sum( (x-mean(x))^2 ) # (2).

In R the function sd() calculates the latter.

In the same way, expressed via z-values skewness and kurtosis can be defined as the _sample_ statistic (also called "biased estimator" , see: http://www.mathdaily.com/lessons/Skewness ):

skewness = mean(z^3) # (3)

kurtosis = mean(z^4)-3 # (4)

with z = (x - mean(x))/sd(x)

     with sd = 1/n * sum( (x - mean(x)^2 )
     (thus: here sd is the _sample_ statistic, see (1) above!)

but they can also be defined as the estimated population parameters (also called "unbiased", see:
http://www.mathdaily.com/lessons/Kurtosis#Sample_kurtosis ):

skewness = n/((n-1)*(n-2)) * sum(z^3) # (5)

kurtosis = n*(n+1)/((n-1)*(n-2)*(n-3)) * sum(z^4) - 3*(n-1)^2/((n-2)*(n-3)) # (6)

with z = (x - mean(x))/sd(x)

     with sd = 1/(n-1) * sum( (x - mean(x)^2 )
     (thus: here sd is the estimated population parameter, see (2) 
above!. BTW: The R function scale() calculates the z-values based on this definition, as well.)

Campbell wrote:
> This is probably an issue over definitions rather than the correct
> answer. To me skewness and kurtosis are functions of the distribution
> rather than the population, they are equivalent to expectation rather
> than mean. For the normal distribution it makes no sense to estimate
> them as the distribution is uniquely defined by its first two moments.
> However there are two defnitions of kurotsis as it is often
> standardized such that the expectation is 0.



Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Edmund-Siemers-Allee 1
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)

        +49-040-42838.4591 (Billon)
fax: +49-040-42838.2344
email: dirk.enzmann@jura.uni-hamburg.de
www:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 24 22:33:27 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:01 EST