Re: [Rd] serialize() takes too long when serializing to a raw vector

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Thu 25 Jan 2007 - 14:21:41 GMT

On 1/25/2007 6:32 AM, Ashish Kulkarni wrote:
> Hello,
>
> R version 2.4.1 (2006-12-18)
> i386-pc-mingw32
>
> Calling serialize() with a NULL connection serializes it to a raw vector. However, when the object to be serialized is large, it takes a very long time:
>

>> system.time( serialize(matrix(0, 1000, 1000), NULL) )

> [1] 38.25 40.73 81.54 NA NA
>
>> system.time( serialize(matrix(0, 2000, 2000), NULL) )

> [1] 609.72 664.75 1318.57 NA NA
>
> I was using this in Rmpi, where a clustered call returned a large matrix. However, serializing to a file or sockets is very fast for the very same matrix -- hence I wrote this function which runs much faster:
>
> .mpi.quick.serialize <- function (object)
> {
> fname <- tempfile("Rmpi")
> stream <- file(fname, "wb")
> on.exit({
> close(stream)
> file.remove(fname)
> })
> serialize(object, stream)
> close(stream)
> size <- file.info(fname)$size
> stream <- file(fname, "rb")
> return(readBin(stream, "raw", n = size))
> }
>
>> system.time( .mpi.quick.serialize(matrix(0, 1000, 1000) ) )

> [1] 0.2500000000000000 0.0499999999999545 0.3000000000001819
> [4] NA NA
>
>> system.time( .mpi.quick.serialize(matrix(0, 2000, 2000) ) )

> [1] 1.059999999999945 0.220000000000027 1.289999999999964
> [4] NA NA
>
> Does anyone have an idea why the performance difference is so
> large? Also, I was wondering if there is a better way -- the
> above solution feels like a quick fix rather than a correct
> approach.

It looks like a bug in the serialize code: it's reallocating the output buffer far too often, and that's slowing things down. I'll confirm that's what's going on and fix it.

Duncan Murdoch



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Jan 26 01:44:29 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 25 Jan 2007 - 15:31:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.