[Rd] Fwd: Digest package - make digest generic?

From: Roger Peng <rdpeng_at_gmail.com>
Date: Tue, 16 Oct 2007 08:25:24 -0400

Sorry, I forgot the 'reply-all'.

-roger

Would it be possible to instead create a function with a name like 'digest0' which is the current function, and then create a generic function with the name 'digest'? In this case 'digest0' always returns the digest of the "raw" object.

My one concern is that my current expectation is that 'digest' takes an object and hashes the entire object, regardless of class. So if two objects are different (even in their internal representation), they should return different digests. I would be a little worried if 'digest' had a different (and perhaps unpredictable) behavior depending on the class of the object where two objects that were in fact different could lead to the same digest.

I can see why one might want class-specific behavior, but what a class author wants from 'digest' may not be different from what other users of 'digest' on that object want.

A simple approach might be

digest0 <- function(x, ...) digest(unclass(x), ...)

although this doesn't work for S4 objects I don't think.

-roger

On 10/15/07, Henrik Bengtsson <hb_at_stat.berkeley.edu> wrote:
> On 10/15/07, hadley wickham <h.wickham@gmail.com> wrote:
> > On 10/15/07, Henrik Bengtsson <hb_at_maths.lth.se> wrote:
> > > [As agreed, CC:ing r-devel since others might be interested in this as well.]
> > >
> > > Hi.
> > >
> > > On 10/15/07, Dirk Eddelbuettel <edd_at_debian.org> wrote:
> > > >
> > > > Hi Hadley,
> > > >
> > > > On 15 October 2007 at 09:51, hadley wickham wrote:
> > > > | Would you consider making digest a generic function? That way I could
> > > > | (e.g.) make a generic method for ggplot objects which didn't depend
> > > > | (so much) on their internal representation.
> > > >
> > > > Well, generally speaking, I always take patches :)
> > >
> > > I see know problems in doing this. The patch would be:
> > >
> > > digest <- function(...) UseMethod("digest");
> > > digest.default <- <current digest function>.
> > >
> > > I think that should do, and I don't think it has any surprising side
> > > effects so it could be added in the next release. Dirk, can you do
> > > that?
> > >
> > > >
> > > > I have to admit that I am fairly weak on these aspects of the S language.
> > > > One question is: how to the current users of digest (i.e. Henrik's and
> > > > Seth's caching mechanism, for example) use it on arbitrary objects _without_
> > > > it being generic?
> > >
> > > I basically put everything I want into a list() and pass that to
> > > digest::digest().
> >
> > Yes, that's what I'm doing too.
> >
> > > >
> > > > | The reason I ask is that I'm using digest as a way of coming up with a
> > > > | unique file name for each example graphic. I want to be able to
> > > > | easily compare the appearance of examples between versions, but
> > > > | currently the digest depends on internal details, so it's hard to
> > > > | match up graphics between versions.
> > >
> > > See loadCache(key) and saveCache(object, key) in R.cache, which
> > > basically loads and saves results from and to a file cache based on a
> > > key object - no need to specify paths or filenames. You can specify
> > > paths etc if you want to, but by default it is just transparent.
> >
> > The problem is I need to refer to the image from the documentation, so
> > I do need to know it's path. I also want to be able to look at the
> > image, so if the digests are different I can see what the difference
> > is (I'm planning to automate this with the imagemagick compare command
> > line tool).

>

> See ?findCache. That will give you the pathname given a key. It is
> on purpose that I do not list this function in the HTML help index - I
> want to keep the "public" API to a minimum.
>

> /Henrik
>

> >
> > > However, I think Hadley is referring to a different problem.
> > > Basically, he got an object containing a lot of fields, but for his
> > > purposes it is only a subset of the fields that he wants to use to
> > > generate a consistent the hashcode. If he pass any other field, that
> >
> > Yes, exactly.
> >
> > > will break the consistency. In that case, the designer of the class
> > > has to identify the fields that makes uniquely identify the state of
> > > the object. I do that for many of my object and pass them down in a
> > > list() structure to digest(). I agree, by making digest() generic,
> > > one can make the code nicer. [If there is a need to dispatch on
> > > multiple arguments, we have to go for S4, but otherwise S3 gives the
> > > minimal modification].
> > >
> > > Side comment: This basically comes down to how for instance Java deals
> > > with hashCode() and equals() etc. By default the object as is used to
> > > generate the hashcode (and can be used by equals() compare objects).
> >
> > Yes, that's the model I was thinking of too.
> >
> > Hadley
> >
> > --
> > http://had.co.nz/
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/


-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 16 Oct 2007 - 12:28:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 25 Oct 2007 - 11:37:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.