Re: [Rd] Consistency of serialize(): please enlighten me

From: Hin-Tak Leung <hin-tak.leung_at_cimr.cam.ac.uk>
Date: Tue, 04 Sep 2007 00:18:04 +0100

I have a couple of ideas - serialize() can store references (and some simple assignment are just stored as references until one tries to modify part of the copy, i.e. in a copy-on-write manner); ocassionally, it will also store the package name as an attribute to the class name in which the class was defined. Maybe neither of this is the case, but what does a hexdump tell you? (just printing the result of rawToChar() to the console).

Henrik Bengtsson wrote:

> Forgot...
> 
> On 8/31/07, Henrik Bengtsson <hb_at_stat.berkeley.edu> wrote:

>> Hi,
>>
>> I am puzzled with serialize(). It comes down generating identical
>> hash codes for (apparently) identical objects using digest::digest(),
>> which in turn relies on serialize(). Here is an example illustration
>> the issue:
>>
>> ser <- function(object, ...) {
>> list(
>> names = names(object),
>> namesRaw = charToRaw(names(object)),
>> ser = serialize(names(object), connection=NULL, ascii=FALSE)
>> )
>> } # ser()
>>
>> # Object to be serialized
>> key <- key0 <- list(abc="Hello");
>>
>> # Store results
>> d <- list();
>>
>> # 1. As is
>> d[[1]] <- ser(key);
>>
>> # 2. Set names and redo (hardwired: identical to what's already there)
>> names(key) <- "abc";
>> d[[2]] <- ser(key);
>>
>> # 3. Set names and redo (generic: char->raw->char)
>> key <- key0;
>> names(key) <- sapply(names(key), FUN=function(name) rawToChar(charToRaw(name)));
>> d[[3]] <- ser(key);
>>
>> # All names are identical
>> for (kk in 2:length(d))
>> stopifnot(identical(d[[1]]$names, d[[kk]]$names));
>>
>> # All raw names are identical
>> for (kk in 2:length(d))
>> stopifnot(identical(d[[1]]$namesRaw, d[[kk]]$namesRaw));
>>
>> # But, the serialized names differ.
>> print(identical(d[[1]]$ser, d[[2]]$ser));
>> print(identical(d[[1]]$ser, d[[3]]$ser));
>> print(identical(d[[2]]$ser, d[[3]]$ser));
> 
> With R version 2.6.0 Under development (unstable) (2007-08-23 r42614) I get:
> [1] TRUE
> [1] FALSE
> [1] FALSE
> 
> and with R version 2.5.1 Patched (2007-07-19 r42284):
> [1] FALSE
> [1] FALSE
> [1] TRUE
> 

>> So, it seems like there is some extra information in the names
>> attribute that is part of the serialization. Is it possible to show
>> they differ at the R level? What is that extra information?
>> Promises...?
>>
>> Please enlighten me.
>>
>> Henrik
>>
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 03 Sep 2007 - 23:26:44 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 04 Sep 2007 - 12:40:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.