Re: [R] including data frames in R packages

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Mon, 25 Feb 2008 09:17:31 +0000

On Sun, 2008-02-24 at 18:05 -0800, dxc13 wrote:
> useR's,
>
> Does any one know if there is a size limitation on the data frames that can
> be included in R packages. I have a data set in a text file that I would
> like to include in a package I am building and it is 8.5 MB in size. Will
> this be problematic? Is the process for including data sets in packages
> documented in WRE?
>

> Thanks,
> dxc

Is the 8.5MB the size of the text file or the size of the saved object - the objects can be compressed using the 'compress' argument to save, which could save some space.

How much memory does the object occupy in memory and how much memory is required to use it in examples? Not everyone has masses of RAM yet - I was stung by that with an early version package I wrote a while back; I hadn't considered memory usage of the examples and my poor laptop with 512MB of RAM took 10+ hours to run R CMD check on it because I quickly got into swap hell, a process that completed in a few minutes on my main development machine.

8.5MB isn't particularly large these days for most people but consider that not all users are on fast ADSL connections and if your package is likely to be popular and provided to CRAN, then there is load and bandwidth on those servers to consider.

Also, 8.5MB of text file suggests quite a lot of data to be using in examples. If you do, consider the execution time for your code - CRAN runs checks on a host of architectures for all the packages stored there, so if it takes ages to check your package because you are using a large data set, that would be something to consider.

Then there is the issue of seeing the woods for the trees. If the data set is intended to illustrate the package functions via examples, having a simple example that is easily comprehended is far better than a more complicated example. Having said that, 8.5MB may be typical for your subject area, in which case this may be of less significance.

My two penneth,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 25 Feb 2008 - 09:22:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Feb 2008 - 09:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive