From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Wed, 19 Nov 2008 11:10:04 -0500

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 19 Nov 2008 - 16:12:09 GMT

Date: Wed, 19 Nov 2008 11:10:04 -0500

Try this:

x <- c(1, 0.049, 0.129, 0.043, 0.013, 0.015, 0.040, 0.066,
0.038, 0.2040, 0.0221, 0.234, 0.0443, 0.0684, 0.035)
cl <- kmeans(x, 5)

cl

newold <- with(cl, data.frame(old = x, new = centers[cluster]))
newold

On Wed, Nov 19, 2008 at 10:43 AM, Random Walker <kinch1967_at_gmail.com> wrote:

*>
*

> I have a list of entrants (1-14 in this example) in a competitive event and

*> corresponding win probabilities for each entrant.
**>
**> [(1, 0.049), (2, 0.129), (3, 0.043), (4, 0.013), (5, 0.015), (6,
**> 0.040), (7, 0.066), (8, 0.038), (9, 0.204), (10, 0.022), (11, 0.234),
**> (12, 0.044), (13, 0.068), (14, 0.035)]
**>
**> So, of course Sum(ps) = 1.
**>
**> In order to make some subsequent computations more tractable, I wish to
**> cluster entrant win probabilities like so:
**>
**> [(1, 0.049), (2, 0.121), (3, 0.049), (4, 0.024), (5, 0.024), (6,
**> 0.049), (7, 0.072), (8, 0.049), (9, 0.185), (10, 0.024), (11, 0.185),
**> (12, 0.049), (13, 0.072), (14, 0.049)]
**>
**> viz. in this case I have 'bucketed' the entrant numbers against 5
**> representative probabilities and in subsequent computations will deem (for
**> example) the win probability of 3 to be 0.049, so another way of visualising
**> the result is:
**>
**> [((4, 5, 10), 0.024),
**> ((3, 6, 8, 12, 14), 0.049),
**> ((7, 13), 0.072),
**> ((2), 0.121),
**> ((11), 0.185)]
**>
**> and (3 * 0.024) + (5 * 0.049) + (2 * 0.072) + (1 x 0.121) + (1 x 0.185) ~=
**> 1.
**>
**> My question is: What is the most 'correct' way to cluster these
**> probabilities? In my case the problem is not totally unconstrained. I would
**> like to specify the number of buckets (probably will always wish to use
**> either 5 or 6), so I do not need an algorithm which determines the most
**> appropriate number of buckets given some cost function. I just need to know
**> for a given number of buckets, which entrants go in which buckets and what
**> is the representative probability for each bucket.
**>
**> The first thing which occurs to me is to sort probabilities into ascending
**> order, generate all partitions of the list into (say) 5 buckets, and pick
**> the partition which minimises the sum of squared differences from the mean
**> of each bucket summed over all buckets. If buckets were not associated with
**> probabilities I would do this without a second thought... but I wonder if
**> this is the right thing to do here? I'm too statistically naive to know one
**> way or the other.
**>
**> I would appreciate any suggestions re correct approach and also (obviously)
**> any tips on how one might go about this in R using canned functions.
**>
**> Many thanks!
**>
**>
**>
**> --
**> View this message in context: http://www.nabble.com/Bucketing-Grouping-Probabilities-tp20582544p20582544.html
**> Sent from the R help mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 19 Nov 2008 - 16:12:09 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 19 Nov 2008 - 17:30:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*