Re: [Rd] cut takes long time

From: Deepayan Sarkar <deepayan.sarkar_at_gmail.com>
Date: Wed, 16 Jun 2010 22:50:02 -0700

On Wed, Jun 16, 2010 at 3:56 PM, Gabor Grothendieck <ggrothendieck_at_gmail.com> wrote:
> The following cut command takes nearly 10 seconds on my machine even
> though the length of input vector is only 6.  I am running on Windows
> Vista with C2D BLAS using R 2.11.1.  Using the default BLAS and either
> R 2.10.1 or "R version 2.12.0 Under development (unstable) (2010-05-31
> r52164)" also gives me results in the 9-11 second range.
> I would have expected it to take much less time.
>
>
> tt <- structure(c(631206000, 631206060, 631206180, 631206240, 631206300,
> 978224400), class = c("POSIXt", "POSIXct"), tzone = "")
>
> system.time(cut(tt, "2 hours", include = TRUE)) # 9.45  0.01  9.58

The POSIXt aspect is not relevant to this, it's the number of breakpoints.

> system.time(cut(tt, "2 hours", include = TRUE))

   user system elapsed
  5.884 0.108 6.033
> system.time(cut(rnorm(6), breaks = 50000))

   user system elapsed
  5.200 0.000 5.558

And the time seems linear in the number of breakpoints, which is not surprising. The "Note" section in ?cut does mention more efficient alternatives.

Note that

> system.time(cut(tt, "2 hours", include = TRUE, labels = FALSE))

   user system elapsed
   0.02 0.00 0.02

so it's the conversion to factors that seems to take most of the time.

-Deepayan



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 17 Jun 2010 - 05:52:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Jun 2010 - 08:51:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive