Re: R-beta: directory of functions

Z. Todd Taylor (zt_taylor@pnl.gov)
Fri, 20 Jun 1997 09:59:02 -0700


Date: Fri, 20 Jun 1997 09:59:02 -0700
From: zt_taylor@pnl.gov (Z. Todd Taylor)
Subject: Re: R-beta: directory of functions
To: r-help@stat.math.ethz.ch

Thomas Lumley <thomas@biostat.washington.edu> wrote:

> > Accommodating the new scoping rules has required R to completely
> > take over the administration of "databases."  It is no longer
> > easy to maintain a "directory" of similar objects.  And if I do
> > have such a collection, R must load *all* of them into memory in
> > order to use just one of them.  Also, I can no longer have
> > transparent access to foreign data via S's user-defined database
> > mechanism.
> 
> This is not a necessary consequence of the scoping rules -- they require
> only that R has access to the objects, not that they reside in memory. The
> S scoping rules would similarly require that all of an attached directory
> is *available* but not that it actually resides in memory.  

That's good to know.

> As I think has been pointed out in the past when this issue was raised, it
> is theoretically possible to implement a very similar database mechanism
> where the scoping availability is handled by storing "promises to load"
> rather than by loading the data directly. This could be implemented in a
> variety of ways, including the user-defined methods that S-PLUS provides. 

I very much hope that happens (hence I bring this topic
up from time to time... :-)

> > The change was made in the interest of speed, but that will only
> > happen for "small" datasets.  I'm not sure the benefits of cleaner
> > random number generator functions is worth what we lost.
> 
> This is only true when "small" is interpreted in a fairly liberal fashion.
> For data sets I consider at least moderate in size (a few thousands of
> records) R is still substantially faster.  In fact the speedup is greatest
> when the data set occupies a non-negligible fraction of available memory,
> which on today's computers can easily be 16 or more megabytes even in a
> multi-user system. For really big datasets you may be right.

Yes, my definition of small is liberal.  My "large" datasets are
sometimes measured in Gigabytes rather than Megabytes.

--Todd
-- 
Z. Todd Taylor
Pacific Northwest National Laboratory
zt_taylor@pnl.gov
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=