Re: [R] Incrementally building histograms

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed, 05 Nov 2008 21:16:23 -0500

You can eliminate the loop like this (untested):

cnt <- function(file) {
 data <- scan(file, quiet = TRUE)
 hist(data, plot = FALSE, breaks = breaks)$counts }
Reduce("+", sapply(files, cnt))

On Wed, Nov 5, 2008 at 8:01 PM, Andre Nathan <andre_at_digirati.com.br> wrote:
> Hello
>
> I need to build a histogram from data (numbers in the [0,1] interval)
> stored in a number of different files. The total amount of data is very
> large, so I can't load everything to memory and then simply call hist().
> Since what I actually need are the histogram counts, I'm currently doing
> it like this:
>
> breaks <- seq(0, 1, by = 0.01)
> files <- list.files(pattern = "some pattern")
> counts <- 0
> for (file in files) {
> data <- scan(file, quiet = T)
> h <- hist(data, plot = F, breaks = breaks)
> counts <- counts + h$counts
> }
> # and then work with `counts' here
>
> Is there a more efficient and/or idiomatic way to do this?
>
> Thanks,
> Andre
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 06 Nov 2008 - 02:19:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Nov 2008 - 04:30:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive