Re: [R] for loop performance

From: Philipp Pagel <p.pagel_at_wzw.tum.de>
Date: Thu, 14 Apr 2011 15:14:16 +0200

On Thu, Apr 14, 2011 at 06:50:56AM -0500, Barth B. Riley wrote:
>
> Thank you Phillip for your post. I am reading in:
>
> 1. a 3 x 100 item parameter file (floating point and integer data)
> 2. a 100 x 1000 item response file (integer data)
> 3. a 6 x 1000 person parameter file (contains simulation condition
> information, person measures)
>
> 4. I am then computing several statistics used in subsequent ROC
> analyses, the AUCs being stored in a 6000 x 15 matrix of floating
> point numbers
>
> I am using read.table for #1-#3 and write.table for #4. The process
> of reading files (#1-#3) and writing to file is done over 6,000
> iterations.

A few ideas:

  1. try to use the colClasses argument to read.table. That way R will not have to guess the data type of columns.
  2. When you say 6000 iterations - do you mean you are reading/writing the SAME files over and over again? Or do you have 6000 sets of files? In the former case the obvious advice would be to only read them once.
  3. If the input files were generated in R, another option would be to save()/load() them rather than using write.table()/read.table().
  4. If the came from some other application, possibly storing everything in a database may speed up things.
  5. Is your data on a file server? If yes: try moving it to the local disc temporarily to see if network i/o is limiting your speed.
  6. Whatever you try to improve performance - measure the effects rather than rely on your impression (system.time, Rprof, ...) in order to find out what part of the program is actually eating up the most time.

cu

        Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 14 Apr 2011 - 13:17:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 14 Apr 2011 - 13:20:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive