Re: [Rd] Risk of readRDS() not detecting race conditions with parallel saveRDS()?

From: William Dunlap <wdunlap_at_tibco.com>
Date: Sat, 15 Sep 2012 19:44:13 +0000

Why not write the RDS file more atomically - write it to a temporary file and rename that file to its final name when it is completely written? E.g.,

saveRDS.atomically
function (object, file, ...)
{

    tfile <- tempfile(basename(file), dirname(file))     on.exit(if (file.exists(tfile)) unlink(tfile))     retval <- saveRDS(object, tfile, ...)     if (!file.rename(tfile, file)) { # perhaps want an if(file.exists(file))unlink(file) first

        stop("Cannot rename temporary file ", tfile, " to ", 
            file)

    }
    invisible(retval)
}

(The file.rename may be tripped up by an overeager virus checker looking at the newly created tfile. I don't know the best way to deal with that.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-devel-bounces_at_r-project.org [mailto:r-devel-bounces_at_r-project.org] On Behalf
> Of Henrik Bengtsson
> Sent: Saturday, September 15, 2012 10:22 AM
> To: R-devel
> Subject: [Rd] Risk of readRDS() not detecting race conditions with parallel saveRDS()?
>
> I hardly know anything about the format used in (non-compressed)
> serialization/RDS, but hoping someone with more knowledge could give
> me some feedback;
>
> Consider two R processes running in parallel on the same unknown file
> system. Both of them write and read to the same RDS file foo.rds
> (without compression) at random times using saveRDS(object,
> file="foo.rds", compress=FALSE) and object2 <-
> readRDS(file="foo.rds"). This happens frequently enough such that
> there is a risk for the two processes to write to the same "foo.rds"
> file at the same time (here one needs to acknowledge that file updates
> are not atomic nor instant).
>
> To simulate the event that two processes writes to the same file at
> the same time (and non-atomically) results in a interweaved/appended
> "foo.rds" file, I manually corrupted "foo.rds" by
> inserting/dropping/replacing a single random byte. It appears that
> readRDS() will detect this simple event, by throwing an error on
> "unknown input format", which is what I want. My question is now, is
> it reasonable to assume that if two or more processes happen to write
> to the same RDS file at the same time, it is extremely unlikely (*)
> that they would generate a file that would pass as valid by readRDS()?
> (*) extremely unlikely = if all of us would run this toy example we
> would not end up with a non-detect but still corrupt "foo.rds" file
> in, say, 10000 years.
>
> Background: The R.cache package allows memoization (caching of
> results) to file such that the cache is persistent across R sessions.
> The persistent part is achieved by writing cache files to the same
> file directory. This is safe when you run a single process, and even
> if readRDS() would fail to read a cache file it is no big deal; the
> memoization will just fail and the results will be recalculated and be
> resaved. The questions is what happens if you run this in parallel
> and push it to the extreme; is there a risk that the memoization will
> properly return but with invalid results. I prefer not having to
> synchronize this with a mutex/semaphore/common server, but instead
> rely on this try-an-see approach (cf. the Ethernet protocol on shared
> medium). My guess (and hope) is that the risk is extremely unlikely

> (*), but I'd like to hear if someone else thinks otherwise.
>
> Thanks,
>
> Henrik
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 15 Sep 2012 - 19:48:27 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 16 Sep 2012 - 03:00:42 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive