Re: [Rd] shash in unique.c

From: Seth Falcon <>
Date: Fri, 05 Mar 2010 10:19:36 -0800

On 3/5/10 4:40 AM, Matthew Dowle wrote:
> Thanks a lot. Quick and brief responses below...
> "Duncan Murdoch"<> wrote in message
>> Matthew Dowle wrote:
>>> I was hoping for a 'yes', 'no', 'maybe' or 'bad idea because ...'. No
>>> response resulted in a retry() after a Sys.sleep(10 days).
>>> If its a "yes" or "maybe" then I could proceed to try it, test it, and
>>> present the test results and timings to you along with the patch. It
>>> would be on 32bit Ubuntu first, and I would need to either buy, rent time
>>> on, or borrow a 64bit machine to be able to then test there, owing to the
>>> nature of the suggestion.
>>> If its "no", "bad idea because..." or "we were already working on it, or
>>> better", then I won't spend any more time on it.
>>> Matthew
>>> "Matthew Dowle"<> wrote in message
>>> news:hlu4qh$l7s$
>>>> Looking at shash in unique.c, from R-2.10.1 I'm wondering if it makes
>>>> sense to hash the pointer itself rather than the string it points to?
>>>> In other words could the SEXP pointer be cast to unsigned int and the
>>>> usual scatter be called on that as if it were integer?
>> Two negative but probably not fatal issues:
>> Pointers and ints are not always the same size. In Win64, ints are 32
>> bits, pointers are 64 bits. (Can we be sure there is some integer type
>> the same size as a pointer? I don't know, ask a C expert.)
> No we can't be sure. But we could test at runtime, and if the assumption
> wasn't true, then revert to the existing method.

I think the idea is, on the whole, a reasonable one and would be inclined to apply a patch if it demonstrated some measurable performance improvement.

For the 32bit v 64bit issue, I think we could detect and in the 64bit case take something like:

    ((int)p) ^ ((int)(p >> 32))

>> We might want to save the hash to disk. On restore, the pointer based
>> hash would be all wrong. (I don't know if we actually do ever save a hash
>> to disk. )
> The hash table in unique.c appears to be a temporary private hash, different
> to the global R_StringHash. Its private hash appears to be used only while
> the call to unique runs, then free'd. Thats my understanding anyway. The
> suggestion is not to alter the global R_StringHash in any way at all, which
> is the one that might be saved to disk now or in the future.

I agree with your reading: this is a temporary hash table and there would be little reason to want to save it (it is not saved now).

+ seth

Seth Falcon | @sfalcon |

______________________________________________ mailing list
Received on Fri 05 Mar 2010 - 18:22:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 05 Mar 2010 - 20:00:58 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive