[Rd] serialization regression in 2.15.0 beta

From: Ben Goodrich <bg2382_at_columbia.edu>
Date: Fri, 23 Mar 2012 19:13:46 -0400


Hi,

I am experiencing a problem related to serialization behavior in 2.15.0 beta (binary installed from Debian unstable) and 2.16.0 (from svn) that is not present in 2.14.2 (binary from Debian testing).

I don't fully understand the problem. Also, I tried but have not yet been able to create a small, self-contained example that reproduces the problem. However, I do have a large, not self-contained example, which requires an alpha version (not yet on CRAN) of the mi package
(the mi package on CRAN would not exhibit this issue). Anyone
interested in reproducing the problem can follow the readme.txt file in this directory:

http://www.columbia.edu/~bg2382/mi/serialization/

I track r-devel with git-svn and was able to git bisect to svn commit r58219

commit 799102bd9d0266fe89c3120981decf0b1f17ef11 Author: ripley <ripley_at_00db46b3-68df-0310-9c12-caf00c1e9a41> Date: Sat Jan 28 15:02:34 2012 +0000

     make use of non-xdr serialization;.

although this commit could merely expose the problem rather than cause it.

The problem occurs when the FUN called by mclapply() in the parallel package returns a S4 object that contains a slot (called X) that is a large matrix, specifically a "model matrix" similar to that produced by glm(). Some columns of this matrix get corrupted with wrong values
(usually zero, but sometimes NaN or 10^300ish), which can be seen by
examining X right before FUN returns (to mclapply()'s environment) and comparing to the "same" X after mclapply() returns to the calling environment.

Part of svn commit r58219 is this hunk

diff --git a/src/library/parallel/R/unix/mcfork.R b/src/library/parallel/R/unix/mcfork.R
index 8e27534..4f92193 100644
--- a/src/library/parallel/R/unix/mcfork.R +++ b/src/library/parallel/R/unix/mcfork.R @@ -82,7 +82,8 @@ mckill <- function(process, signal = 2L)   ## used by mcparallel, mclapply
  sendMaster <- function(what)
  {
- if (!is.raw(what)) what <- serialize(what, NULL, FALSE) + # This is talking to the same machine, so no point in using xdr. + if (!is.raw(what)) what <- serialize(what, NULL, xdr = FALSE)

      .Call(C_mc_send_master, what, PACKAGE = "parallel")   }

Contrary to the comment, I have found that if I specify xdr = TRUE, I get the expected (non-corrupted X slot) behavior in 2.16.0, even though it is forking locally on my 64bit Debian laptop with a little endian i7 processor, whose specs are

goodrich_at_CYBERPOWERPC:/tmp/serialization$ cat /proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping        : 7
microcode       : 0x17
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge  
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3990.83
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

...

processor : 7
[same as processor 0]

So, to summarize I get the good behavior on R 2.14.2 when using mclapply(), on 2.15.0 beta when using lapply(), and on 2.16.0 using mclapply() iff I patch in xdr = TRUE in sendMaster(). I get the bad behavior on 2.15.0 beta and unpatched 2.16.0 when using mclapply().

My session info:

> sessionInfo()
R version 2.15.0 beta (2012-03-16 r58769) Platform: x86_64-pc-linux-gnu (64-bit)

locale:

  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods [8] base

other attached packages:

  [1] mi_0.9-83        bigmemory_4.2.11 arm_1.5-03       foreign_0.8-49
  [5] abind_1.4-0      R2WinBUGS_2.1-18 coda_0.14-5      lme4_0.999375-42
  [9] Matrix_1.0-4     lattice_0.20-0   MASS_7.3-17

loaded via a namespace (and not attached): [1] grid_2.15.0 nlme_3.1-103

Thanks,
Ben



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 25 Mar 2012 - 19:46:20 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 27 Mar 2012 - 21:50:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive