Re: [Rd] idea for "virtual matrix/array" class

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Tue 24 Aug 2004 - 07:13:40 EST

On Mon, 23 Aug 2004, Tony Plate wrote:

>

> One idea I was thinking about was to have a new class of object that
> referred to data in a file on disk, and which had all the standard methods
> of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc. The
> object in memory would only store the array attributes, while the actual
> array data (the elements) would reside in a file. When some extraction
> method was called, it would access data in the file and return the
> appropriate data. With sensible use of seek operations, the data access
> could probably be quite fast. The file format of the object on disk could
> possibly be the standard serialized binary format as used in .RData
> files. Of course, if the object was larger than would fit in memory, then
> trying to extract too large a subarray would exhaust memory, but it should
> be possible to efficiently extract reasonably sized subarrays. To be more
> useful, one would want want apply() to work with such arrays. That would
> be doable, either by creating a new method for apply, or possibly just for
> aperm.

This is what RPgSql does with proxy dataframes and what I did (read-only) for netCDF access. It's a good idea if you have a data format for which random access is fairly fast. I'm not sure that the standard serialized binary format satisfies this. Fixed-format text files would work, but free-format ones wouldn't -- seek() only helps when you can work out where to seek without reading all the data.

        -thomas



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue Aug 24 07:17:27 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 08:59:22 EST