Re: [R] what does cut(data, breaks=n) actually do?

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Thu, 13 Dec 2007 09:32:37 +0100

melissa cline wrote:
> Hello,
>
> I'm trying to bin a quantity into 2-3 bins for calculating entropy and
> mutual information. One of the approaches I'm exploring is the cut()
> function, which is what the mutualInfo function in binDist uses. When it's
> called in the format cut(data, breaks=n), it somehow splits the data into n
> distinct bins. Can anyone tell me how cut() decides where to cut?
>
>
This is one case where reading the actual R code is easier that explaining what it does. From cut.default

    if (length(breaks) == 1) {

        if (is.na(breaks) | breaks < 2)
            stop("invalid number of intervals")
        nb <- as.integer(breaks + 1)
        dx <- diff(rx <- range(x, na.rm = TRUE))
        if (dx == 0)
            dx <- rx[1]
        breaks <- seq.int(rx[1] - dx/1000, rx[2] + dx/1000, length.out = nb)
    }

so basically it takes the range, extends it a bit and splits in into <breaks> equally long segments.

(For the sometimes more attractive option of splitting into groups of roughly equal size, there is cut2 in the Hmisc package, or use quantile())

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 13 Dec 2007 - 08:37:38 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 13 Dec 2007 - 09:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.