Re: [Rd] Arrays Partial unserialization

From: Jeff Ryan <jeff.a.ryan_at_gmail.com>
Date: Fri, 31 Aug 2012 10:01:57 -0500

There is no such tool to my knowledge, though the mmap package can do very similar things. In fact, it will be able to do this exactly once I apply a contributed patch to handle endianess.

The issue is that rds files are compressed by default, so directly reading requires uncompressing, which makes subset selection not possible, at least to the best of my knowledge of the compression algorithms in use. (BDR's reply after this one clarifies)

What you can do though is writeBin by column, and read in incrementally. Take a look at the mmap package, specifically:

example(mmap)
example(struct)
example(types)

The struct one is quite useful for data.frame like structures on disk, including the ability to modify struct padding etc. This one is more row oriented, so lets you store various types in row-oriented fashion in one file.

?mmap.csv is an example function that will also let you read csv files directly into an 'mmap' form - and shows the 'struct' functionality.

At some point I will write an article on all of this, but the vignette for mmap is illustrative of most of the value.

The indexing package on R-forge (as well as talks about it given by me at useR 2010 and R/Finance 2012) may also be of use - though that is more 'database' rather than a more simplistic sequential stepping through data on disk.

HTH
Jeff

On Fri, Aug 31, 2012 at 9:41 AM, Duncan Murdoch <murdoch.duncan_at_gmail.com> wrote:
> On 31/08/2012 9:47 AM, Damien Georges wrote:
>>
>> Hi all,
>>
>> I'm working with some huge array in R and I need to load several ones to
>> apply some functions that requires to have all my arrays values for each
>> cell...
>>
>> To make it possible, I would like to load only a part (for example 100
>> cells) of all my arrays, apply my function, delete all cells loaded,
>> loaded following cells and so on.
>>
>> Is it possible to unserialize (or load) only a defined part of an R array
>> ?
>> Do you know some tools that might help me?
>
>
> I don't know of any tools to do that, but there are tools to maintain large
> objects in files, and load only parts of them at a time, e.g. the ff
> package. Or you could simply use readBin and writeBin to do the same
> yourself.
>
>>
>> Finally, I did lot of research to find the way array (and all other R
>> object) are serialized into binary object, but I found nothing
>> explaining really algorithms involved. If someone has some information
>> on this topic, I'm interesting in.
>
>
> You can read the source for this; it is in src/main/serialize.c.
>
> Duncan Murdoch
>
>>
>> Hoping my request is understandable,
>>
>> All the best,
>>
>> Damien.G
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Jeffrey Ryan
jeffrey.ryan_at_lemnica.com

www.lemnica.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 31 Aug 2012 - 15:04:43 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 01 Sep 2012 - 01:30:42 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive