Re: [Rd] dict package: dictionary data structure for R

From: Seth Falcon <>
Date: Sat, 21 Jul 2007 19:40:44 -0700

"Gabor Grothendieck" <> writes:

> Although the proto package is not particularly aimed at hashing note
> that it covers some of the same ground and also is based on a well
> thought out object model (known as object-based programming
> or prototype programming).

Interesting. The dict package differs from proto in that it _is_ aimed at hashing and:

In Bioconductor, we have many hashtables where the key is an Affymetrix probeset ID. These look sort of like "1000_at". It turns out that the algorithm used by R's environments is not very good at hashing these values. The dict package lets you investigate this:

   keys2 = paste(seq(1000, length=13000), "at", sep="_")

   # here, hash.alg=0L corresponds to the hashing function used by R's    # environments. I know, a name would be better.    > summary(as.integer(table(hashCodes(keys=keys2, hash.alg=0L, size=2^14))))    Min. 1st Qu. Median Mean 3rd Qu. Max.     800 1100 1500 1625 2025 2700    # hash.alg=1L is djb2 from here:    > summary(as.integer(table(hashCodes(keys=keys2, hash.alg=1L, size=2^14))))    Min. 1st Qu. Median Mean 3rd Qu. Max.   1.000 1.000 2.000 1.648 2.000 4.000

  # and this is what we see with an environment:

    > e = new.env(hash=T, size=2^14)
    > for (k in keys2) e[[k]] = k
    > summary(env.profile(e)$counts)
         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
       0.0000    0.0000    0.0000    0.7935    0.0000 2700.0000 

Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center

______________________________________________ mailing list
Received on Sun 22 Jul 2007 - 02:42:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 23 Jul 2007 - 07:36:37 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.