From: Ted Harding <Ted.Harding_at_manchester.ac.uk>

Date: Mon, 19 May 2008 12:00:32 +0100 (BST)

>> Hi Folks,

*>> I'd like to know how hist() decides how many cells to use
*

*>> when it ignores my "suggestion" to use say 'hist(...,breaks=50)'.
*

*>>
*

*>> More specifically, I have the results of 10000 simulations,
*

*>> each returning an 8-vector, therefore 8 variables each with
*

*>> 10000 values. Some of these 8 have somewhat skew distributions.
*

*>> Say one of these 8 variables is X.
*

*>>
*

*>> I ask for H <- hist(X,breaks=50), and get a histogram which
*

*>> usually has a different number of cells than what I intended.
*

*>>
*

*>> For instance, for one of these simulations, the 8 different
*

*>> values of length(H$breaks) are:
*

*>>
*

*>> 70, 44, 38, 68, 50, 40, 46, 45
*

*>>
*

*>> ?hist tells me
*

*>>
*

*>> A)
*

*>> breaks: one of:
*

*>> * a vector giving the breakpoints between histogram
*

*>> cells,
*

*>> * a single number giving the number of cells for the
*

*>> histogram,
*

*>> * a character string naming an algorithm to compute the
*

*>> number of cells (see Details),
*

*>> * a function to compute the number of cells.
*

*>>
*

*>> In the last three cases the number is a suggestion only.
*

*>>
*

*>> B)
*

*>> The default for 'breaks' is '"Sturges"': see 'nclass.Sturges'.
*

*>>
*

*>> If I look at the code for nclass.Sturges() I see
*

*>>
*

*>> function (x) ceiling(log2(length(x)) + 1)
*

*>>
*

*>> and, for length(X) = 10000, this gives 15. This is not related
*

*>> to any of the numbers of breaks I actually got, in any way obvious
*

*>> to me.
*

*>>
*

*>> So:
*

*>> Question 1: hist() has apparently ignored my "suggestion" of
*

*>> "break=50". Why? What is the criterion for ignoring?
*

*>>
*

*>> Question 2: Presumably, if it ignores the "suggestion", it
*

*>> does something else, of its choice. I would then, perhaps,
*

*>> expect it to fall back to its default, which is (allegedly)
*

*>> Sturges. But the result from nclass.Sturges looks different
*

*>> from what it actually did. So what did it actually do, and
*

*>> how did it decide on this?
*

*>>
*

E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 19 May 2008 - 11:52:24 GMT

Date: Mon, 19 May 2008 12:00:32 +0100 (BST)

On 19-May-08 10:00:10, Peter Dalgaard wrote:

> (Ted Harding) wrote:

>> Hi Folks,

> No, it is not ignoring you. > > Try > > hist(rnorm(10000)) > length(hist(rnorm(10000),breaks=50)$breaks) > > and repeat a dozen of times or so. Chances are that you'll mostly see > lengths around 40, but definitely more than the 17 or so that you'll > see without the breaks=50. Next, try > > diff(hist(rnorm(10000),breaks=50)$breaks) > > and notice that this is usually 0.2, although if you repeat enough > times, you might get a couple of cases with 0.1 and a length of > 75(-ish). > > Get it? Otherwise look at help(pretty) since this is what is doing the > work. > > -p

Thanks for the pointer to 'pretty', whose role is not mentioned in "?hist". I shall study this! (I still don't "get it"!)

In your example above I generally get 38-40 breaks (with 50 requested), but once (in about 30 repetitions) I got 72, as you point out.

I then tried it with 1.1*rnorm(10000), and got 42-51; then with 1.2*rnorm(10000), and got 46-51; then with 1.3*rnorm(10000), and got 47-61.

It seems there is a slightly unstable relationship between the urge to honour the requested "n=50", and the desire to achieve "nice" numerical values (on the scale of 10) for the values of the breakpoints.

Thanks.

Ted.

E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861

Date: 19-May-08 Time: 12:00:28 ------------------------------ XFMail ------------------------------ ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 19 May 2008 - 11:52:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 19 May 2008 - 12:30:37 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*