Re: [Rd] Binning of integers with hist() function odd results (PR#14046)

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Sat, 07 Nov 2009 17:27:35 +0100

gug_at_fnal.gov wrote:
> Full_Name: Gerald Guglielmo
> Version: 2.8.1 (2008-12-22)
> OS: OSX Leopard
> Submission from: (NULL) (131.225.103.35)
>
>
> When I attempt to use the hist() function to bin integers the behavior seems
> very odd as the bin boundary seems inconsistent across the various bins. For
> some bins the upper boundary includes the next integer value, while in others it
> does not. If I add 0.1 to every value, then the hist() binning behavior is what
> I would normally expect.
>

>> h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
>> h1$mids

> [1] 1.5 2.5 3.5 4.5
>> h1$counts

> [1] 3 3 4 5
>> h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1))
>> h2$mids

> [1] 1.5 2.5 3.5 4.5 5.5
>> h2$counts

> [1] 1 2 3 4 5
>
> Naively I would have expected the same distribution of counts in the two cases,
> but clearly that is not happening. This is a simple example to illustrate the
> behavior, originally I noticed this while binning a large data sample where I
> had set the breaks=c(0,24,1).

This is as documented. See the include.lowest argument. Annoying, but not a bug.

(It is arguably a design error that hist() is looking for "pretty" breakpoints rather than pretty midpoints, or maybe something more advanced to handle cases where the data are effectively tied to a lattice. It's been around "forever", though.)

-- 
    O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 07 Nov 2009 - 16:30:37 GMT

This archive was generated by hypermail 2.2.0 : Sun 08 Nov 2009 - 13:00:21 GMT