From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>

Date: Mon, 19 May 2008 12:00:10 +0200

Date: Mon, 19 May 2008 12:00:10 +0200

(Ted Harding) wrote:

> Hi Folks,

*> I'd like to know how hist() decides how many cells to use
**> when it ignores my "suggestion" to use say 'hist(...,breaks=50)'.
**>
**> More specifically, I have the results of 10000 simulations,
**> each returning an 8-vector, therefore 8 variables each with
**> 10000 values. Some of these 8 have somewhat skew distributions.
**> Say one of these 8 variables is X.
**>
**> I ask for H <- hist(X,breaks=50), and get a histogram which
**> usually has a different number of cells than what I intended.
**>
**> For instance, for one of these simulations, the 8 different
**> values of length(H$breaks) are:
**>
**> 70, 44, 38, 68, 50, 40, 46, 45
**>
**> ?hist tells me
**>
**> A)
**> breaks: one of:
**> * a vector giving the breakpoints between histogram
**> cells,
**> * a single number giving the number of cells for the
**> histogram,
**> * a character string naming an algorithm to compute the
**> number of cells (see Details),
**> * a function to compute the number of cells.
**>
**> In the last three cases the number is a suggestion only.
**>
**> B)
**> The default for 'breaks' is '"Sturges"': see 'nclass.Sturges'.
**>
**> If I look at the code for nclass.Sturges() I see
**>
**> function (x) ceiling(log2(length(x)) + 1)
**>
**> and, for length(X) = 10000, this gives 15. This is not related
**> to any of the numbers of breaks I actually got, in any way obvious
**> to me.
**>
**> So:
**> Question 1: hist() has apparently ignored my "suggestion" of
**> "break=50". Why? What is the criterion for ignoring?
**>
**> Question 2: Presumably, if it ignores the "suggestion", it
**> does something else, of its choice. I would then, perhaps,
**> expect it to fall back to its default, which is (allegedly)
**> Sturges. But the result from nclass.Sturges looks different
**> from what it actually did. So what did it actually do, and
**> how did it decide on this?
**>
*

No, it is not ignoring you.

Try

hist(rnorm(10000))

length(hist(rnorm(10000),breaks=50)$breaks)

and repeat a dozen of times or so. Chances are that you'll mostly see lengths around 40, but definitely more than the 17 or so that you'll see without the breaks=50. Next, try

diff(hist(rnorm(10000),breaks=50)$breaks)

and notice that this is usually 0.2, although if you repeat enough times, you might get a couple of cases with 0.1 and a length of 75(-ish).

Get it? Otherwise look at help(pretty) since this is what is doing the work.

-p

*> With thanks,
**> Ted.
**>
*

> --------------------------------------------------------------------

*> E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk>
**> Fax-to-email: +44 (0)870 094 0861
**> Date: 19-May-08 Time: 10:31:20
**> ------------------------------ XFMail ------------------------------
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Mon 19 May 2008 - 10:59:58 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 19 May 2008 - 12:30:37 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*