Re: R-beta: directory of functions

Thomas Lumley (thomas@biostat.washington.edu)
Fri, 20 Jun 1997 09:31:01 -0700 (PDT)


Date: Fri, 20 Jun 1997 09:31:01 -0700 (PDT)
From: Thomas Lumley <thomas@biostat.washington.edu>
To: r-help@stat.math.ethz.ch
Subject: Re: R-beta: directory of functions
In-Reply-To: <199706201526.IAA04798@aggie.pnl.gov>



> Richard Lammers <lammers@edac.sr.unh.edu> wrote:
> 
> > Is there any way to create a directory of functions in R?  

One other way is to make a library.  Currently I think libraries have to
be in $RHOME/library but this may well change soon. The command
library(foo) loads $RHOME/library/foo if it has not already been loaded. 

Ross&Robert are thinking about better ways to implement libraries to
reduce memory load.

On Fri, 20 Jun 1997, Z. Todd Taylor wrote:
> 
> I think you're touching on the reason I fear R may never be
> useful to me in many of my projects.
> 
> Accommodating the new scoping rules has required R to completely
> take over the administration of "databases."  It is no longer
> easy to maintain a "directory" of similar objects.  And if I do
> have such a collection, R must load *all* of them into memory in
> order to use just one of them.  Also, I can no longer have
> transparent access to foreign data via S's user-defined database
> mechanism.

This is not a necessary consequence of the scoping rules -- they require
only that R has access to the objects, not that they reside in memory. The
S scoping rules would similarly require that all of an attached directory
is *available* but not that it actually resides in memory.  

As I think has been pointed out in the past when this issue was raised, it
is theoretically possible to implement a very similar database mechanism
where the scoping availability is handled by storing "promises to load"
rather than by loading the data directly. This could be implemented in a
variety of ways, including the user-defined methods that S-PLUS provides. 

> The change was made in the interest of speed, but that will only
> happen for "small" datasets.  I'm not sure the benefits of cleaner
> random number generator functions is worth what we lost.

This is only true when "small" is interpreted in a fairly liberal fashion.
For data sets I consider at least moderate in size (a few thousands of
records) R is still substantially faster.  In fact the speedup is greatest
when the data set occupies a non-negligible fraction of available memory,
which on today's computers can easily be 16 or more megabytes even in a
multi-user system. For really big datasets you may be right.

As I understand it the scoping rules are not a deliberate feature, but are
a consequence of the implementation of R as a Scheme interpreter, which is
directly responsible for a lot of its speed.  The fact that they have
programming advantages is a bonus: I love the fact that the apply command
does what I expect in R.


Thomas Lumley
-----------------------------------------------------+------
Biostatistics		: "Never attribute to malice what  :
Uni of Washington	:  can be adequately explained by  :
Box 357232		:  incompetence" - Hanlon's Razor  :
Seattle WA 98195-7232	:				   :
------------------------------------------------------------

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=