Re: [R] tempfile problem

From: Ben Madin <lists_at_remoteinformation.com.au>
Date: Mon, 21 Jun 2010 06:34:10 +0800

Thanks all for the advice,

I'm using the time with parts of a second (to 6 decimal places) hashed, and so I ran a loop : (my attempt at repeatable code!)

> library(digest)
> op <- options(digits.secs=6)
> a <- NA
> for (i in 1:50000) { a[i] <- digest(Sys.time(), algo='crc32') }
> options(op)

and received no hash more than once, so I'll go with that for now.

I did some assume that there would be a suffix option, but it would appear that a loop to check could be the responsibility of the user. However, I think that it is odd that I can open an R session on three independent machines, and receive the same filename suggestions for the first invocation of the tempfile function each time. Because I am using PL/R to access R from PostgreSQL, this means I will always get the same names in order, which has left me very confused about how the random seeding process works? I was under the impression that the random seed was set by the system time when the session started, so there should be a very low probability of collisions, especially with a mix of machines and operating systems.

cheers

Ben

On 18/06/2010, at 1:17 , Romain Francois wrote:

> 
> Le 17/06/10 18:59, Duncan Murdoch a écrit :
>> 
>> On 17/06/2010 12:43 PM, Ben Madin wrote:

>>> G'day all,
>>>
>>> The documentation for tempfile states :
>>>
>>> "The names are very likely to be unique among calls to tempfile in an
>>> R session and across simultaneous R sessions. The filenames are
>>> guaranteed not to be currently in use."
>>>
>>> My problem I think relates to the second part of the sentence, which
>>> is the guarantee... and it is being met ... but I need to save the
>>> files as .png files, in the same directory, so I am adding the suffix
>>> and I suppose therefore the next offering can be unique (as it doesn't
>>> have the prefix)
>>>
>>> I am using a command like :
>>>
>>> > fname <- basename(tempfile("nahis",
>>> "/Library/WebServer/Documents/nahis/tmp"))
>>>
>>> on a mac, or
>>> > fname <- basename(tempfile("nahis", "/htdocs/nahis/tmp"))
>>>
>>> on a FreeBSD system, as I need to be able to find the file from the
>>> web browser up to 24 hours later.
>>>
>>> and then
>>> > this_filename <- paste(fname, ".png", sep = "")
>>>
>>> and saving the file as this_filename, hence the next call doesn't find
>>> it's own suggestion, and starts again.
>> 
>> It sounds as though you are doing something strange with the random
>> number seed, because those names are chosen at random, and then checked
>> for uniqueness. If
>> the seed is being reset you could get the same name twice in a row, but
>> otherwise it's very unlikely. (And it's the C library function rand(),
>> not R's RNG that is used.)

>>> Is there any alternative filenameing approach I can use to get around
>>> this? Do I need to manually scan and reject the name if it matches the
>>> names I already have? Should I just digest the current time ? (It's
>>> working so far!)
>> 
>> If you use the current time, watch out for timer accuracy and fast
>> computers. You may be able to get more than one file created before the
>> next timer tick.
>> 
>> I'd suggest that you should generate more than enough filenames once at
>> the start, confirm they're all unique, and then just take them one by
>> one as needed. Alternatively, create the tempfile() as well as the
>> tempfile().png, but this is likely to be really slow if the seed is the
>> same each time, because checking for the existence of the first n tries
>> is going to be slow.
>> 
>> Duncan Murdoch
> 
> Would it not make sense to change the signature of tempfile to this:
> 
> function (pattern = "file", tmpdir = tempdir(), suffix = "" )
> 
> and include the suffix in the "does the file exist" test ?
> 
> Romain
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/98Uf7u : Rcpp 0.8.1
> |- http://bit.ly/c6YnCi : graph gallery collage
> `- http://bit.ly/bZ7ltC : inline 0.3.5
> 
> 

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 20 Jun 2010 - 22:36:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 20 Jun 2010 - 22:50:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive