Re: [R] Measuring dispersion

From: Jan T. Kim <jtk_at_cmp.uea.ac.uk>
Date: Wed, 18 Jun 2008 11:08:20 +0100

On Wed, Jun 18, 2008 at 12:10:18AM +0100, S. Nunes wrote:
> Thanks for the suggestion, however I'm looking for a score since my
> goal is to rank thousands of distributions.
> For instance, given a large text, I would like to rank all terms
> according to their distribution (dispersion) within the text.
>
> Terms evenly distributed in the text should have a low score. Terms
> following an uneven distribution should rank higher.

as a perhaps rather rough-and-ready approach, you could look at the variance of the difference series, considering your example:

> Thanks again,
> --
> S?rgio Nunes
>
> 2008/6/17 Moshe Olshansky <m_olshansky_at_yahoo.com>:
> > You could also look at the difference between your empirical distribution and the uniform distribution (something like Kolmogorov-Smirnov test).
> >
> >
> > --- On Tue, 17/6/08, S. Nunes <snunes_at_gmail.com> wrote:

[...]

> >> An example:
> >>
> >> [0; 0.2; 0.4; 0.6; 0.8; 1] - function should be ~ 0

    > var(diff(c(0, 0.2, 0.4, 0.6, 0.8, 1)))     [1] 2.311116e-33

(that's 0 obviously, with some error due to floating point processing)

> >> [0; 0.1; 0.1; 0.15; 1] - function should be > 1

    > var(diff(c(0, 0.1, 0.1, 0.15, 1)))     [1] 0.1616667

Best regards, Jan

-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jtk_at_cmp.uea.ac.uk                               |
 |             WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 18 Jun 2008 - 11:21:32 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 18 Jun 2008 - 11:31:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive