Re: [Rd] serialize/unserialize vector improvement

From: Prof Brian Ripley <>
Date: Fri, 27 Jan 2012 09:55:53 +0000

On 22/01/2012 13:56, Prof Brian Ripley wrote:
> This has languished for a long time, and we should make a decision
> before FF for 2.15.0.
> It seems to me that in so far as there is a problem, it is that we
> serialize via XDR, and that since that was invented little-endian CPUs
> have taken over the world. So for the only cases I can imagine this is
> really a problem (passing objects in 'parallel'/snow ... contexts) a
> better answer might be to pass without byte-reordering: go back to the
> RDB format which was exposed for save() but AFAIK never for serialize.
> I would say Sparc is the only big-endian platform left (some PPC Mac
> users may disagree), so little-endian really does rule.

This does all seem to depend on the quality of the platform's XDR implementation: for example, a similar example runs twice as fast on x86_64 Mac OS X as on i386 R on the same machine.

On all the (little-endian) platforms I tried not using XDR (serialize(xdr = FALSE)) made an improvement of around 3x. On some a version of Spiegel's patch helped equally and on others it made a much smaller improvement. In the best-case scenario (i386 OS X) there was a 10x improvement. But that is only going to be noticeable in rare applications.

A version of Spiegel's idea (with changes confined to just one file) will appear in R-devel shortly.

> Brian
> On 03/10/2011 14:28, wrote:
>> It's on my list to look at but I may not get to it for a couple of
>> weeks. Someone else may get there earlier.
>> Best,
>> luke
>> On Mon, 3 Oct 2011, Michael Spiegel wrote:
>>> Any thoughts? I haven't heard any feedback on this patch.
>>> Thanks!
>>> --Michael
>>> On Wed, Sep 28, 2011 at 3:10 PM, Michael Spiegel
>>> <> wrote:
>>>> Hi folks,
>>>> I've attached a patch to the svn trunk that improves the performance
>>>> of the serialize/unserialize interface for vector types. The current
>>>> implementation: a) invokes the R_XDREncode operation for each element
>>>> of the vector type, and b) uses a switch statement to determine the
>>>> stream type for each element of the vector type. I've added
>>>> R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements
>>>> at a time, and I've reorganized the implementation so that the stream
>>>> type is not queried once per element.
>>>> In the following microbenchmark (below), I've observed performance
>>>> improvements of about x2.4. In a real benchmark that is using the
>>>> serialization interface to make MPI calls, I see about a 10%
>>>> improvement in performance.
>>>> Cheers,
>>>> --Michael
>>>> microbenchmark:
>>>> input <- matrix(1:100000000, 10000, 10000)
>>>> output <- serialize(input, NULL)
>>>> for(i in 1:10) { print(system.time(serialize(input, NULL))) }
>>>> for(i in 1:10) { print(system.time(unserialize(output))) }
>>> ______________________________________________
>>> mailing list
>> ______________________________________________
>> mailing list

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
Received on Fri 27 Jan 2012 - 10:01:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 27 Jan 2012 - 12:50:12 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive