[R] How should I improve the following R code?

From: Seung Jun <seungwjun_at_gmail.com>
Date: Mon, 7 Jan 2008 18:49:33 -0500


I'm looking for a way to improve code that's proven to be inefficient.

Suppose that a data source generates the following table every minute:

  Index Count


  0      234
  1      120
  7      11
  30     1

I save the tables in the following CSV format:

  time,index,count
  0,0:1:7:30,234:120:11:1
  1,0:2:3:19,199:110:87:9

That is, each line represents a table, and I have N lines for N minutes of data collection.

Now, I wrote the following code to get quantiles for each time period:

  library(Hmisc)

  stbl  <- read.csv("data.csv")
  index <- lapply(strsplit(stbl$index, ":", fixed = TRUE), as.numeric)
  count <- lapply(strsplit(stbl$count, ":", fixed = TRUE), as.numeric)
  len   <- length(index)

  for (i in 1:len) {
    v <- wtd.quantile(index[[i]], count[[i]], c(0, 0.2, 0.5, 0.8, 1))
    stbl$q0[i] <- v[1]
    stbl$q2[i] <- v[2]
    stbl$q5[i] <- v[3]
    stbl$q8[i] <- v[4]
    stbl$q10[i] <- v[5]

  }

It works fine for a small N, but it get quickly inefficient as N grows. The for-loop takes too long. How could I improve the code or data representation so it can run fast?

Thanks,
Seung



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 Jan 2008 - 23:52:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Jan 2008 - 01:30:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive