Re: [R] naive question

From: <james.holtman_at_convergys.com>
Date: Thu 01 Jul 2004 - 06:38:24 EST

It is amazing the amount of time that has been spent on this issue. In most cases, if you do some timing studies using 'scan', you will find that you can read some quite large data structures in a reasonable time. If you initial concern was having to wait 10 minutes to have your data read in, you could have read in quite a few data sets by now.

When comparing speeds/feeds of processors, you also have to consider what it being done on them. Back in the "dark ages" we had a 1 MIP computer with 4M of memory handling input from 200 users on a transaction system. Today I need a 1GHZ computer with 512M to just handle me. Now true, I am doing a lot different processing on it.

With respect to I/O, you have to consider what is being read in and how it is converted. Each system/program has different requirements. I have some applications (running on a laptop) that can read in approximately 100K rows of data per second (of course they are already binary). On the other hand, I can easily slow that down to 1K rows per second if I do not specify the correct parameters to 'read.table'.

So go back and take a look at what you are doing, and instrument your code to see where time is being spent. The nice thing about R is that there are a number of ways of approaching a solution and it you don't like the timing of one way, try another. That is half the fun of using R.



James Holtman "What is the problem you are trying to solve?" Executive Technical Consultant -- Office of Technology, Convergys james.holtman@convergys.com
+1 (513) 723-2929
                                                                                                                   
                      <rivin@euclid.math.te                                                                        
                      mple.edu>                    To:       <p.dalgaard@biostat.ku.dk>                            
                      Sent by:                     cc:       r-help@stat.math.ethz.ch,                             
                      r-help-bounces@stat.m         tplate@blackmesacapital.com, rivin@euclid.math.temple.edu      
                      ath.ethz.ch                  Subject:  Re: [R] naive question                                
                                                                                                                   
                                                                                                                   
                      06/30/2004 16:25                                                                             
                                                                                                                   
                                                                                                                   




> <rivin@euclid.math.temple.edu> writes:
>
>> I did not use R ten years ago, but "reasonable" RAM amounts have
>> multiplied by roughly a factor of 10 (from 128Mb to 1Gb), CPU speeds
>> have gone up by a factor of 30 (from 90Mhz to 3Ghz), and disk space
>> availabilty has gone up probably by a factor of 10. So, unless the I/O
>> performance scales nonlinearly with size (a bit strange but not
>> inconsistent with my R experiments), I would think that things should
>> have gotten faster (by the wall clock, not slower). Of course, it is
>> possible that the other components of the R system have been worked on
>> more -- I am not equipped to comment...
>
> I think your RAM calculation is a bit off. in late 1993, 4MB systems
> were the standard PC, with 16 or 32 MB on high-end workstations.

I beg to differ. In 1989, Mac II came standard with 8MB, NeXT came standard with 16MB. By 1994, 16MB was pretty much standard on good quality (= Pentium, of which the 90Mhz was the first example) PCs, with 32Mb pretty common (though I suspect that most R/S-Plus users were on SUNs, which were somewhat more plushly equipped).

> Comparable figures today are probably 256MB for the entry-level PC and
> a couple GB in the high end. So that's more like a factor of 64. On the
> other hand, CPU's have changed by more than the clock speed; in
> particular, the number of clock cycles per FP calculation has
> decreased considerably and is currently less than one in some
> circumstances.
>
I think that FP performance has increased more than integer performance, which has pretty much kept pace with the clock speed. The compilers have also improved a bit...

  Igor



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 01 06:42:43 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:54:38 EST