Re: [Rd] Fwd: Digest package - make digest generic?

From: Dirk Eddelbuettel <edd_at_debian.org>
Date: Tue, 16 Oct 2007 13:29:33 -0500

Hi Roger,

On 16 October 2007 at 08:25, Roger Peng wrote:
| Sorry, I forgot the 'reply-all'.
|
| -roger
|
| ---------- Forwarded message ----------
| From: Roger Peng <rdpeng_at_gmail.com>
| Date: Oct 16, 2007 8:24 AM
| Subject: Re: [Rd] Digest package - make digest generic?
| To: Henrik Bengtsson <hb_at_stat.berkeley.edu>
|
|
| Would it be possible to instead create a function with a name like
| 'digest0' which is the current function, and then create a generic
| function with the name 'digest'? In this case 'digest0' always
| returns the digest of the "raw" object.
|
| My one concern is that my current expectation is that 'digest' takes
| an object and hashes the entire object, regardless of class. So if
| two objects are different (even in their internal representation),
| they should return different digests. I would be a little worried if
| 'digest' had a different (and perhaps unpredictable) behavior
| depending on the class of the object where two objects that were in
| fact different could lead to the same digest.

But haven't the cryptographers taken care of that argument?

To my layman's understanding, the consensus is that hash collissions are possible but very very unlikely. And we already have that problem with digest as it stands as -- if collission are possible, identical hashes could result from two different input whether or not digest is generic or not.

Or am I missing what you were trying to get at?  

| I can see why one might want class-specific behavior, but what a class
| author wants from 'digest' may not be different from what other users
| of 'digest' on that object want.
|
| A simple approach might be
|
| digest0 <- function(x, ...) digest(unclass(x), ...)

Or, just for argument's sake, we go full circle, digest stays as it is and Hadley implements his own generic, say, 'Digest()', aroumd digest ? Naa....

I think I like the idea of making it generic, but I really would like to know more about possible downsides.

Dirk  

| although this doesn't work for S4 objects I don't think.
|
| -roger
|
| On 10/15/07, Henrik Bengtsson <hb_at_stat.berkeley.edu> wrote:
| > On 10/15/07, hadley wickham <h.wickham_at_gmail.com> wrote:
| > > On 10/15/07, Henrik Bengtsson <hb_at_maths.lth.se> wrote:
| > > > [As agreed, CC:ing r-devel since others might be interested in this as well.]
| > > >
| > > > Hi.
| > > >
| > > > On 10/15/07, Dirk Eddelbuettel <edd_at_debian.org> wrote:
| > > > >
| > > > > Hi Hadley,
| > > > >
| > > > > On 15 October 2007 at 09:51, hadley wickham wrote:
| > > > > | Would you consider making digest a generic function? That way I could
| > > > > | (e.g.) make a generic method for ggplot objects which didn't depend
| > > > > | (so much) on their internal representation.
| > > > >
| > > > > Well, generally speaking, I always take patches :)
| > > >
| > > > I see know problems in doing this. The patch would be:
| > > >
| > > > digest <- function(...) UseMethod("digest");
| > > > digest.default <- <current digest function>.
| > > >
| > > > I think that should do, and I don't think it has any surprising side
| > > > effects so it could be added in the next release. Dirk, can you do
| > > > that?
| > > >
| > > > >
| > > > > I have to admit that I am fairly weak on these aspects of the S language.
| > > > > One question is: how to the current users of digest (i.e. Henrik's and
| > > > > Seth's caching mechanism, for example) use it on arbitrary objects _without_
| > > > > it being generic?
| > > >
| > > > I basically put everything I want into a list() and pass that to
| > > > digest::digest().
| > >
| > > Yes, that's what I'm doing too.
| > >
| > > > >
| > > > > | The reason I ask is that I'm using digest as a way of coming up with a
| > > > > | unique file name for each example graphic. I want to be able to
| > > > > | easily compare the appearance of examples between versions, but
| > > > > | currently the digest depends on internal details, so it's hard to
| > > > > | match up graphics between versions.
| > > >
| > > > See loadCache(key) and saveCache(object, key) in R.cache, which
| > > > basically loads and saves results from and to a file cache based on a
| > > > key object - no need to specify paths or filenames. You can specify
| > > > paths etc if you want to, but by default it is just transparent.
| > >
| > > The problem is I need to refer to the image from the documentation, so
| > > I do need to know it's path. I also want to be able to look at the
| > > image, so if the digests are different I can see what the difference
| > > is (I'm planning to automate this with the imagemagick compare command
| > > line tool).
| >
| > See ?findCache. That will give you the pathname given a key. It is
| > on purpose that I do not list this function in the HTML help index - I
| > want to keep the "public" API to a minimum.
| >
| > /Henrik
| >
| > >
| > > > However, I think Hadley is referring to a different problem.
| > > > Basically, he got an object containing a lot of fields, but for his
| > > > purposes it is only a subset of the fields that he wants to use to
| > > > generate a consistent the hashcode. If he pass any other field, that
| > >
| > > Yes, exactly.
| > >
| > > > will break the consistency. In that case, the designer of the class
| > > > has to identify the fields that makes uniquely identify the state of
| > > > the object. I do that for many of my object and pass them down in a
| > > > list() structure to digest(). I agree, by making digest() generic,
| > > > one can make the code nicer. [If there is a need to dispatch on
| > > > multiple arguments, we have to go for S4, but otherwise S3 gives the
| > > > minimal modification].
| > > >
| > > > Side comment: This basically comes down to how for instance Java deals
| > > > with hashCode() and equals() etc. By default the object as is used to
| > > > generate the hashcode (and can be used by equals() compare objects).
| > >
| > > Yes, that's the model I was thinking of too.
| > >
| > > Hadley
| > >
| > > --
| > > http://had.co.nz/
| > >
| > > ______________________________________________
| > > R-devel_at_r-project.org mailing list
| > > https://stat.ethz.ch/mailman/listinfo/r-devel
| > >
| >
| > ______________________________________________
| > R-devel_at_r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
| >
|
|
| --
| Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
|
|
| --
| Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
|
| ______________________________________________
| R-devel_at_r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Three out of two people have difficulties with fractions.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 16 Oct 2007 - 18:46:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 25 Oct 2007 - 11:37:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.