Re: [Rd] Issue with seek() on gzipped connections in R-devel

From: Jeffrey Ryan <jeffrey.ryan_at_lemnica.com>
Date: Fri, 23 Sep 2011 11:40:27 -0500

Yes, inelegant would be a good description. Sadly a fact we have to put up with I guess.

FWIW, I too don't like dependencies. mmap has none, and is cross platform, but I get the idea.

Good luck,
Jeff

On Fri, Sep 23, 2011 at 11:13 AM, Jon Clayden <jon.clayden_at_gmail.com> wrote:
> Thanks for the replies. I take the point, although it does seem like a
> substantial regression (on non-Windows platforms).
>
> I like to keep the external dependencies of my packages minimal, but I
> will look into the mmap package - thanks, Jeff, for the tip.
>
> Aside from that, though, what is the alternative to using seek? If I
> want to read something at (original, uncompressed) byte offset 352, as
> here, do I have to read and discard everything that comes before it
> first? That seems inelegant at best...
>
> Regards,
> Jon
>
>
> On 23 September 2011 16:54, Jeffrey Ryan <jeffrey.ryan_at_lemnica.com> wrote:
>> seek() in general is a bad idea IMO if you are writing cross-platform code.
>>
>> ?seek
>>
>> Warning:
>>
>>     Use of ‘seek’ on Windows is discouraged.  We have found so many
>>     errors in the Windows implementation of file positioning that
>>     users are advised to use it only at their own risk, and asked not
>>     to waste the R developers' time with bug reports on Windows'
>>     deficiencies.
>>
>> Aside from making me laugh, the above highlights the core reason to not use IMO.
>>
>> For not zipped files, you can try the mmap package.  ?mmap and ?types
>> are good starting points.  Allows for accessing binary data on disk
>> with very simple R-like semantics, and is very fast.  Not as fast as a
>> sequential read... but fast.  At present this is 'little endian' only
>> though, but that describes most of the world today.
>>
>> Best,
>> Jeff
>>
>> On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden <jon.clayden_at_gmail.com> wrote:
>>> Dear all,
>>>
>>> In R-devel (2011-09-23 r57050), I'm running into a serious problem
>>> with seek()ing on connections opened with gzfile(). A warning is
>>> generated and the file position does not seek to the requested
>>> location. It doesn't seem to occur all the time - I tried to create a
>>> small example file to illustrate it, but the problem didn't occur.
>>> However, it can be seen with a file I use for testing my packages,
>>> which is available through the URL
>>> <
https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true>:
>>>
>>>> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb")
>>>> seek(con, 352)
>>> [1] 0
>>> Warning message:
>>> In seek.connection(con, 352) :
>>>  seek on a gzfile connection returned an internal error
>>>> seek(con, NA)
>>> [1] 190
>>>
>>> The same commands with the same file work as expected in R 2.13.1, and
>>> have worked over many previous versions of R.
>>>
>>>> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb")
>>>> seek(con, 352)
>>> [1] 0
>>>> seek(con, NA)
>>> [1] 352
>>>
>>> My sessionInfo() output is:
>>>
>>> R Under development (unstable) (2011-09-23 r57050)
>>> Platform: x86_64-apple-darwin11.1.0 (64-bit)
>>>
>>> locale:
>>> [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] tractor.nt_2.0.1      tractor.session_2.0.3 tractor.utils_2.0.0
>>> [4] tractor.base_2.0.3    reportr_0.2.0
>>>
>>> This seems to occur whether or not R is compiled with
>>> "--with-system-zlib". I see some zlib-related changes mentioned in the
>>> NEWS, but I don't see any indication that this is expected. Could
>>> anyone shed any light on it, please?
>>>
>>> Thanks and all the best,
>>> Jon
>>>
>>> ______________________________________________
>>> R-devel_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>>
>> --
>> Jeffrey Ryan
>> jeffrey.ryan_at_lemnica.com
>>
>> www.lemnica.com
>> www.esotericR.com
>>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Jeffrey Ryan
jeffrey.ryan_at_lemnica.com

www.lemnica.com
www.esotericR.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 23 Sep 2011 - 16:44:29 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 23 Sep 2011 - 16:50:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive