I'm looking for a way to improve code that's proven to be inefficient.

Suppose that a data source generates the following table every minute:

0 234 1 120 7 11 30 1

I save the tables in the following CSV format:

time,index,count 0,0:1:7:30,234:120:11:1 1,0:2:3:19,199:110:87:9

That is, each line represents a table, and I have N lines for N minutes of data collection.

Now, I wrote the following code to get quantiles for each time period:

library(Hmisc)

stbl <- read.csv("data.csv") index <- lapply(strsplit(stbl$index, ":", fixed = TRUE), as.numeric) count <- lapply(strsplit(stbl$count, ":", fixed = TRUE), as.numeric) len <- length(index)

for (i in 1:len) {

v <- wtd.quantile(index[[i]], count[[i]], c(0, 0.2, 0.5, 0.8, 1))

stbl$q0[i] <- v[1] stbl$q2[i] <- v[2] stbl$q5[i] <- v[3] stbl$q8[i] <- v[4] stbl$q10[i] <- v[5]

}

It works fine for a small N, but it get quickly inefficient as N grows. The for-loop takes too long. How could I improve the code or data representation so it can run fast?

