Re: [R] compress data on read, decompress on write

From: Ramon Diaz-Uriarte <rdiaz02_at_gmail.com>
Date: Fri, 29 Feb 2008 19:04:01 +0100

Thanks, Greg. Yes, I'd store the compressed stuff as a raw data type.

Best,

R.

On Thu, Feb 28, 2008 at 11:54 PM, Gregory Warnes <gregory.warnes_at_mac.com> wrote:
>
> You might look at storing the data using R's "raw" data type...
>
> -G
>
>
>
>
> On Feb 28, 2008, at 5:38PM , Ramon Diaz-Uriarte wrote:
>
> > Dear Christos,
> >
> > Thanks for your reply. Actually, I should have been more careful with
> > language: its not really a sparse matrix, but rather a ragged array
> > that results from a more compact representation we though of for the
> > hidden states in a Hidden Markov Model in many runs of MCMC. However,
> > it might make sense for us to check sparseMatrix and see how its done
> > there.
> >
> > Thanks,
> >
> > R
> >
> > On Thu, Feb 28, 2008 at 7:49 PM, Christos Hatzis
> > <christos.hatzis_at_nuverabio.com> wrote:
> >> Ramon,
> >>
> >> If you are looking for a solution to your specific application
> >> (as opposed
> >> to a general compression/ decompression mechanism), it might be
> >> worth
> >> checking out the Matrix package, which has facilities for storing
> >> and
> >> manipulating sparse matrices. The sparseMatrix class stores
> >> matrices in the
> >> triplet representation (i.e. only indices and values of the non-zero
> >> elements) and this affords great compression ratios, depending on
> >> the size
> >> and degree of sparseness of the matrix.
> >>
> >> -Christos
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: r-help-bounces_at_r-project.org
> >>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Ramon Diaz-
> >>> Uriarte
> >>> Sent: Thursday, February 28, 2008 1:18 PM
> >>> To: r-help_at_stat.math.ethz.ch
> >>> Subject: [R] compress data on read, decompress on write
> >>>
> >>> Dear All,
> >>>
> >>> I'd like to be able to have R store (in a list component) a
> >>> compressed data set, and then write it out uncompressed.
> >>> gzcon and gzfile work in exactly the opposite direction. What
> >>> would be a good way to handle this?
> >>>
> >>> Details:
> >>> ----------
> >>>
> >>> We have a package that uses C; part of the C output is a
> >>> large sparse matrix. This is never manipulated directly by R,
> >>> but always by the C code. However, we need to store that data
> >>> somewhere (inside an R
> >>> object) for further calls to the functions in our package.
> >>> We'd like to store that matrix as part of the R object (say,
> >>> as an element of a list). Ideally, it would be stored in as
> >>> compressed a way as possible.
> >>> Then, when we need to use that information, it would be
> >>> decompressed and passed to the C function.
> >>>
> >>> I guess one way to do it is to have C deal with the
> >>> compression and uncompression (e.g., using zlib or the bzip2
> >>> libraries) and then use readBin, etc, from R. But, if I can,
> >>> I'd like to avoid our C code having to call zlib, etc, so as
> >>> to make our package easily portable.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> R.
> >>>
> >>> --
> >>> Ramon Diaz-Uriarte
> >>> Statistical Computing Team
> >>> Structural Biology and Biocomputing Programme Spanish
> >>> National Cancer Centre (CNIO) http://ligarto.org/rdiaz
> >>>
> >>> ______________________________________________
> >>> R-help_at_r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> >
> > --
> > Ramon Diaz-Uriarte
> > Statistical Computing Team
> > Structural Biology and Biocomputing Programme
> > Spanish National Cancer Centre (CNIO)
> > http://ligarto.org/rdiaz
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 29 Feb 2008 - 18:06:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 29 Feb 2008 - 18:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive