[Rd] textConnection performance quadratic (PR#14053)

From: <bill.hopkins_at_level3.com>
Date: Tue, 10 Nov 2009 00:40:11 +0100 (CET)


Full_Name: William E. Hopkins
Version: 2.9.0
OS: Windows XP
Submission from: (NULL) (209.244.4.106)

textConnection() has quadratic performance.

A function I wrote was taking outrageous amount of time to execute on a large character vector (small test set was used for functional development). I created a test harness to execute the function and gather stats (system.time) for various dataset sizes (datasets generated by sample() of very large set). If I used textConnection() to provide input to read.csv(), the performance was quadratic with dataset size. However, if I had the function write the character vector to a temp file then read the data back in via read.csv, the performance was linear.

The reason for using a textConnection() was that the character vector was within a data frame read in via read.csv. The character vector (URLs) needed to be parsed into separate vectors, but no mechanism exists to do that directly (that I know of). So, I used sub() to extract the proper pieces and put commas in between so that I can use read.csv() to read the comma-separate strings directly into vectors.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 09 Nov 2009 - 23:56:33 GMT

This archive was generated by hypermail 2.2.0 : Tue 10 Nov 2009 - 01:00:22 GMT