Re: [R] Q: Suggestions for long-term data/program storage policy?

From: Duncan Murdoch <>
Date: Tue 11 Oct 2005 - 20:54:32 EST

Alexander Ploner wrote:
> Dear list,
> we are a statistical/epidemiological departement that - after a few
> years of rapid growth - finally is getting around to formulate a
> general data storage and retention policy - mainly to ensure that we
> can reproduce results from published papers/theses easier in the
> future, but also with the hope that we get more synergy between
> related projects.
> We have formulated what we feel is a reasonable draft, requiring
> basically that the raw data, all programs to create derived data
> sets, and the analysis programs are stored and documented in a
> uniform manner, regardless of the analysis software used. The minimum
> data retention we are aiming for is 10 years, and the format for the
> raw data is quite sane (either flat ASCII or real
> Given the rapid devlopment cycle of R, this suggests that at the very

> least all non-base packages used in the analysis are stored together
> with each project. I have basically two questions:
> 1) Are old R versions (binaries/sources) going to be available on
> CRAN indefinitely?

I think sources will be, binaries much less reliably. (I just discovered that one or two of the old Windows binaries are corrupted; I'm not sure I'll be able to find good copies.)

> 2) Is .RData a reasonable file format for long term storage?

I think the intention is that it will be supported in future versions of R, but storing data in a binary format is risky. What if you don't use R in 5 years? You would find it a lot easier to decode text format files in another package than .RData format.

The other advantage of text format is that it works very well with version control systems like Subversion or CVS. You can see several versions of the file, see comments on why changes were made, etc.

Duncan Murdoch
> I would also be very grateful for any other suggestions, comments or
> links for setting up and implementing such a storage policy (R-
> specific or otherwise).
> Thank you for your time,
> alexander
> Medical Epidemiology & Biostatistics
> Karolinska Institutet, Stockholm
> Tel: ++46-8-524-82329
> Fax: ++46-8-31 49 75
> [[alternative HTML version deleted]]
> ______________________________________________
> mailing list
> PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Tue Oct 11 20:57:27 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:41 EST