[Rd] A Call for a Smaller R Core Package

From: Zepu Zhang <zpzhang_at_stanfordalumni.org>
Date: Thu 21 Sep 2006 - 04:50:49 GMT


(Below is my idea on an issue that has troubled me for a fairly long time. I hope it's not viewed as trouble making.)

A Call for a Smaller R Core Package

This document suggests downsizing the 'core' package of R by taking out some specialized functionalities to form their own packages. I'll use string related functions as examples, because I happened to be troubled by them today.

  1. The core is too big

R is a function rich environment.
However, non-central functions are better organized in specialized packages.
>From time to time I felt the need to go through the core package for a
complete picture of what are there at my disposal, yet so far I haven't done that.
In the 'R Reference Manual' the core package runs for over 400 pages with about 400 entries, and mysteriously some functions don't show up in the TOC, e.g. 'sub'.
In the two-volume reference set printed by Network-Theory, the core is the entire first book.
In contrast, the 'Intrinsic Functions' chapter of the classic Fortran reference "Fortran 95/2003 Explained" runs for maybe 30(?) pages. I flipped through it many times and I can say with confidence, "OK these are ALL the Fortran intrinsics and I know what they do." For R, I found it an intimidating task to flip through the 400+ pages core and retain a clear mind at the end.

Below is a random sample of string related functions in the core package:

agrep
basename
charmatch
chartr
gregexpr
grep
gsub
regex
regexpr
strsplit
strtrim
strwrap
sub

In my opinion, anything that uses regular expressions belongs somewhere else. Even 'utils' seems to be a better place for random items than the 'core'.

2. Benefits of a smaller core

  1. A smaller core will be more carefully studied and better appreciated.

If the R core functions were documented in 100 pages, I would be a much better R programmer than I am today because I would have singled out and studied the more fundamental routines about function calls, etc.

The criteria for a function to be in the core seem to be: 1) fundamental; or 2) very often used.

A smaller core is more stable.

b) A specialized 'string' package makes string related functions much easier to find.

It could be that I still need all the functions. But since they are grouped together, it greatly helps learning. I would be very rarely reinventing the wheel, because I could quickly get a sweeping view of the dedicated package.

c) It will be easier to enrich string-related functionalities without perplexing the core.

3. Costs of such re-arrangements

  1. To the R development team

(I don't really know.)

For those utility functions that are frequently used in basic functions, they may well stay in the core.
For those that are not, it may not be too difficult to move them around. The spin-off package may be always automatically loaded as a basic one, but as discussed above, a cleaning grouping greatly helps learning and finding things.

b) To R users

The system (both the core and the specialized package) will be easier to learn and use.


R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Sep 21 14:53:26 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 21 Sep 2006 - 07:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.