Re: [R] Dynamic Dictionary Data Type?

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 03 Jun 2005 - 04:40:35 EST

On Thu, 2 Jun 2005, Duncan Murdoch wrote:

> Gabor Grothendieck wrote:
>> On 6/2/05, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:
>>
>>> On Thu, 2 Jun 2005, hadley wickham wrote:
>>>
>>>
>>>>> An environment is a hash table, and copying occurs only on modification,
>>>>> when any language would have to copy in this context.
>>>
>>> Caveat: the default is new.env(hash=FALSE), so an environment is a hash
>>> table in one sense but not necessarily so in the strict sense.
>>
>>
>> Can you expand on this? When would one use hash = TRUE vs.
>> the default?
>
> It's an optimization question. hash = TRUE uses more memory and takes longer
> to set up, but will give faster searches in big environments. The current
> default assumes that environments are small so the effort of building the
> hash table is not worthwhile, and it does linear searches.

It's not really size: building small hash tables is quick. The issue is more to do with whether there are many lookups done compared to entries.

We met the same issues for a named vector a while back. The relevant NEWS item was

     o	Indexing a vector by a character vector was slow if both the
 	vector and index were long (say 10,000).  Now hashing is used
 	and the time should be linear in the longer of the lengths
 	(but more memory is used).


> I suspect that we might have the default wrong (or perhaps should make the
> transition from linear to hash search automatically at a certain threshold
> size), but I haven't done the testing necessary to show this.

Here's an example

tr <- as.character(trunc(runif(1e5, max=100))) system.time({

   env <- new.env(hash=F)
   for(i in 1:1e5) assign(tr[i], i, envir=env) }, gcFirst=TRUE)

which takes about 5% less with hashing. Now change to max=1e4: the hashed version takes about 50% longer, the unhashed one 120x.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jun 03 04:44:40 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:22 EST