R-alpha: ANNOUNCE: R-FAQ v0.0

Kurt Hornik (Kurt.Hornik@ci.tuwien.ac.at)
Mon, 3 Mar 1997 12:38:21 +0100


Date: Mon, 3 Mar 1997 12:38:21 +0100
Message-Id: <199703031138.MAA01161@aragorn.ci.tuwien.ac.at>
From: Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at>
To: r-testers@stat.math.ethz.ch
Subject: R-alpha: ANNOUNCE:  R-FAQ v0.0

As promised, a first version of an R FAQ is now available at the URL

	http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

>From there, you can also get versions of the FAQ in plain ASCII text,
GNU info, DVI, and PostScript, as well as the SGML source used for
creating all these formats.

Feedback ist most welcome.

The ASCII version is appended below.

Have fun,
-k

************************************************************************
  R FAQ
  Kurt Hornik
  v0.0-0, 1997/03/01

  This document contains answers to some of the most frequently asked
  questions about R.  Feedback is welcome.
  ______________________________________________________________________

  Table of Contents:

  1.	Introduction

  1.1.	Legalese

  1.2.	Obtaining this Document

  1.3.	Notation

  1.4.	Feedback

  1.5.	Acknowledgments

  2.	R Basics

  2.1.	What Is R?

  2.2.	What Machines Does R Run on?

  2.3.	What Is the Current Version of R?

  2.4.	How Can R Be Obtained?

  2.5.	How Can R Be Installed?

  2.6.	Are there Binary Distributions for R?

  2.7.	Which Documentation Exists for R?

  2.8.	How Can I Get Help on R?

  3.	R and S

  3.1.	What Is S?

  3.2.	What Is S-PLUS?

  3.3.	What Are the Differences between R and S?

  4.	R Add-On Packages

  4.1.	Which Add-on Packages Exist for R?

  4.2.	How Can Add-on Packages Be Installed?

  4.3.	How Can Add-on Packages Be Used?

  4.4.	How Can I Contribute to R?

  5.	R and Emacs

  5.1.	Is there Emacs Support for R?
  ______________________________________________________________________

  1.  Introduction

  This document contains answers to some of the most frequently asked
  questions about R.

  1.1.	Legalese

  This document is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation; either version 2, or (at your option)
  any later version.

  This program is distributed in the hope that it will be useful, but
  WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	See the GNU
  General Public License for more details.

  If you do not have a copy of the GNU General Public License, write to
  the Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA
  02111-1307, USA.

  1.2.	Obtaining this Document

  The latest version of this document is always available from
  http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html.

  From there, you can also obtain versions converted to plain ASCII
  text, GNU info, DVI, and PostScript, as well as the SGML source used
  for creating all these formats using the SGML-Tools (formerly
  Linuxdoc-SGML) system.

  1.3.	Notation

  Everything should be pretty standard.	`R>' is used for the R prompt,
  and a `$' for the shell prompt (where applicable).

  1.4.	Feedback

  Feedback is of course most welcome.

  In particular, note that I do not have access to Windows or Mac
  systems.  If you have information on these systems that you think
  should be added to this document, please let me know.

  1.5.	Acknowledgments

  To come soon.

  2.  R Basics

  2.1.	What Is R?

  R is a system for statistical computation and graphics.  It consists
  of a language plus a run-time environment with graphics, a debugger,
  access to certain system functions, and the ability to run programs
  stored in script files.

  The design of R has been heavily influenced by two existing languages:
  Becker, Chambers & Wilks' S (see question ````What is S?'''') and
  Sussman's Scheme.  Whereas the resulting language is very similar in
  appearance to S, the underlying implementation and semantics are
  derived from Scheme.	See question ````What Are the Differences
  between R and S?'''' for a discussion of the differences between R and
  S.

  R is being developed by Ross Ihaka and Robert Gentleman, two Senior
  Lecturers at the Department of Statistics of the University of
  Auckland in Auckland, New Zealand.

  R is free software distributed under a GNU-style copyleft.

  2.2.	What Machines Does R Run on?

  R is being developed for the Unix, Windows and Mac platforms.

  R will configure and build under a number of common Unix platforms
  including dec-alpha-osf, freebsd, hpux, linux-elf, sgi-irix, solaris,
  and sunos.  If you know about other platforms, please drop me a note.

  2.3.	What Is the Current Version of R?

  The current Unix version is 0.16.1 alpha.  The versions for Windows
  and Mac are pre-alpha.

  The next Unix version (0.17, or perhaps 0.50) will add group methods
  and complex numbers and hence provide a full implementation of S as
  described in ``The New S Language''.	With some good luck, the Windows
  version will also be 0.50.

  2.4.	How Can R Be Obtained?

  The primary R distribution site (ftp://stat.auckland.ac.nz/pub/R/) is
  mirrored DAILY at the Statlib server at Carnegie-Mellon University, a
  system for distributing statistical software by electronic mail, ftp,
  and the World Wide Web.

  In the interests of preserving international bandwidth (and of keeping
  the developers' internet bills under control) it is strongly
  recommended that you get R from the CMU Statlib server at

       http://lib.stat.cmu.edu/R/

  or from a Statlib mirror even closer to you.

  2.5.	How Can R Be Installed?

  The file INSTALL that comes with the R distribution contains
  installation instructions.

  Under a number of common Unix platforms (see question ````What
  Machines Does R Run on?''''), R can be installed very easily.

  Choose a place to install the R tree (R is not just a binary, but has
  additional data sets, help files, font metrics etc).	Let's call this
  place RHOME (given appropriate permissions, a natural choice would be
  /usr/local/lib/R).  Untar the source code, and issue the following
  commands (at the shell prompt):

       $ ./configure
       $ make
       $ make install-help

  You can also build a LaTeX version of the manual entries with

       $ make install-latex

  and an HTML version of the manual with

       $ make install-html

  If these commands execute successfully, the R binary will be copied to
  the $RHOME/bin directory.  In addition, a shell script font-end called
  `R' will be created and copied to the same directory.	You can copy
  the script to a place where users can invoke it, for example to
  /usr/local/bin/R.

  Other platforms?

  2.6.	Are there Binary Distributions for R?

  Not yet.

  If you are interested in obtain precompiled `.deb' packages for
  installation under Debian GNU/Linux, drop me a note.

  Robert Gentleman has recently made a pre-alpha Windows exectutable
  file available for ftp at
  ftp://stat.auckland.ac.nz/pub/research/rgentlem/rbeta.zip.  This
  binary should be more compatible with Windows 95 than the other (he
  does not know about 3.1).  You still need all the other extra files
  from the previous Windows distribution, it is only an executable.

  2.7.	Which Documentation Exists for R?

  Currently, there is no R manual.  Online documentation for most of the
  functions and variables in R exists, and can be printed on-screen by
  typing help(name) (or ?name) at the R prompt, where name is the name
  of the R object help is sought for.  (In the case of unary and binary
  operators and control-flow special forms, the name may need to be be
  quoted.)

  This documentation can also be made available as HTML, and as hardcopy
  via LaTeX, see question ````How Can R Be Installed?''''.  An up-to-
  date HTML version is always available for web browsing at

       http://www.stat.math.ethz.ch/R-manual

  In the absence of a systematic introduction to R, one can mostly get
  along with introductions to S or S-PLUS, such as

  o  ``Introductory Guide to S-PLUS'' by Brian Ripley
     <ripley@stats.ox.ac.uk>, a beginners' guide to doing statistics in
     S-PLUS.  Available at the Statlib S repository as PostScript in
     full size (sguide.ps1) and reduced 2-on-1 (sguide.ps2)
     respectively.

  o  ``Notes on S-PLUS:	A Programming Environment for Data Analysis and
     Graphics'' by Bill Venables <venables@stats.adelaide.edu.au> and
     David Smith <D.M.Smith@lancaster.ac.uk>.  This document talks
     mostly about plain S features, and does not concentrate on features
     specific to S-PLUS.  It is available from the Statlib S repository
     as a shar archive with LaTeX source (splusnotes.shar) and as
     PostScript (splusnotes.ps).

     It has recently been pointed out by Bill Venables that a slightly
     newer version of the latter can be found in
     ftp://attunga.stats.adelaide.edu.au/pub/courses which could be used
     for rewriting into an R manual.  Robert Gentleman has just
     announced that this project is already under way.

  Last, but not least, Ross' and Robert's experience in designing and
  implementing R is described in:

       @Article{,
	 author =	{Ross Ihaka and Robert Gentleman},
	 title =	{R: {A} Language for Data Analysis and Graphics},
	 journal =	{Journal of Computational and Graphical Statistics},
	 year =		1996,
	 volume =	5,
	 number =	3,
	 pages =	{299--314}
       }

  This is also the reference for R to use in publications.

  2.8.	How Can I Get Help on R?

  The developers of R can be reached for comments and reports at

       R@stat.auckland.ac.nz

  Thanks to Martin Maechler <maechler@stat.math.ethz.ch> there is also
  the R testers mailing list to which you can subscribe to receive
  announcements of new versions, bug fixes, and so on.	To subscribe (or
  unsubscribe) to the mailing list send subscribe (or unsubscribe) in
  the BODY of the message (not in the subject!) to r-testers-
  request@stat.math.ethz.ch.  To send a message to everyone on the list,
  send email to

       r-testers@stat.math.ethz.ch

  Information about the mailing list can be obtained by typing

       $ echo info | mail r-testers-request@stat.math.ethz.ch

  at the shell prompt.

  The URL http://www.maths.uq.oz.au/~gks/webguide/maillist/rtest.html
  provides a WWW information page about this mailing list.

  I recommend that you send mail rather to the mailing list than only to
  the developers (who are also subscribed to the list, of course).  This
  may save them precious time they can use for constantly improving R,
  and will typically also result in much quicker feedback for yourself.

  Of course, in the case of bugs it would be very helpful to have code
  which reliably reproduces the problem.

  3.  R and S

  3.1.	What Is S?

  S is a very high level language and an environment for data analysis
  and graphics.	S was written by Richard A. Becker, John M. Chambers,
  and Allan R. Wilks of AT&T Bell Laboratories Statistics Research
  Department.

  The primary references for S are two books by the creators of S.

  o  Richard A. Becker, John M. Chambers and Allan R. Wilks (1988),
     ``The New S Language,'' Chapman & Hall, London.

     This book is often called the ``Blue Book''.

  o  John M. Chambers and Trevor J. Hastie (1992), ``Statistical Models
     in S,'' Chapman & Hall, London.

     This is also called the ``White Book''.

  There is a huge amount of user-contributed code for S, available at
  the S Repository at CMU.

  See the ``Frequently Asked Questions about S''
  (http://lib.stat.cmu.edu/S/faq) for further information about S.

  3.2.	What Is S-PLUS?

  S-PLUS is a value-added version of S sold by Statistical Sciences,
  Inc. (now a division of Mathsoft, Inc.)  S is a subset of S-PLUS, and
  hence anything which may be done in S may be done in S-PLUS.	In
  addition S-PLUS has extended functionality in a wide variety areas,
  including robust regression, modern nonparametric regression, time
  series, survival analysis, multivariate analysis, classical
  statistical tests, quality control, and graphics drivers.  Add-on
  modules add additional capabilities for wavelet analysis, spatial
  statistics, and design of experiments.

  See the MathSoft S-PLUS page for further information.

  3.3.	What Are the Differences between R and S?

  Whereas the developers of R have tried to stick to the S language as
  defined in ``The New S Language'' (Blue Book, see question ``What is
  S?''), they have adopted the evaluation model of Scheme.

  This difference becomes manifest when free variables occur in a
  function.  Free variables are those which are neither formal
  parameters (occurring in the argument list of the function) nor local
  variables (created by assigning to them in the body of the function).
  Whereas S (like C) by default uses static scoping, R (like Scheme) has
  adopted lexical scoping.  This means the values of free variables are
  determined by a set of global variables in S, but in R by the bindings
  that were in effect at the time the function was created.

  Consider the following function:

       cube <- function(n) {
	 sq <- function() n * n
	 n * sq()
       }

  Under S, sq() does not ``know'' about the variable n unless it is
  defined globally:

       S> cube(2)
       Error in sq():  Object "n" not found
       Dumped
       S> n <- 3
       S> cube(2)
       [1] 18

  In R, the ``environment'' created when cube() was invoked is also
  looked in:

       R> cube(2)
       [1] 8

  As one consequence, S must violate the principle of lazy evaluation
  for assignments in function argument lists, i.e., for named arguments
  (it might otherwise be impossible to evaluate the assigments inside
  the function).  R always uses lazy evaluation.  (Folks, correct me if
  I am wrong here, but I think this was what caused Martin's function
  plot.step() to behave differently under R and S.)

  Lexical scoping allows using function closures and maintaining local
  state.  A simple example (taken from Abelson and Sussman) can be found
  in the `demos/language' subdirectory of the R distribution.  Further
  information is provided in the standard R reference ``R: A Language
  for Data Analysis and Graphics'' (see question ````Which Documentation
  Exists for R?'''') and a paper on ``Lexical Scope and Statistical
  Computing'' by Robert Gentleman and Ross Ihaka
  (ftp://stat.auckland.ac.nz/pub/research/rgentlem/lexical.tex).

  (R & R, what about adding this paper to the R distribution?)

  Lexical scoping also implies a further major difference.  Whereas S
  stores all objects as separate files in a directory somewhere (usually
  `.Data' under the current directory), R does not.  All objects in R
  are stored internally.  When R is started up it grabs a very large
  piece of memory and uses it to store the objects.  R performs its own
  memory management of this piece of memory.  Having everything in
  memory is necessary because it is not really possible to externally
  maintain all relevant ``environments'' of symbol/value pairs.	This
  difference also seems to make R much faster than S.

  The down side is that if R crashes you will lose all the work for the
  current session.  Saving and restoring the memory ``images'' (the
  functions and data stored in R's internal memory at any time) can be a
  bit slow, especially if they are big.	In S this does not happen,
  because everything is saved in disk files and if you crash nothing is
  likely to happen to them.  R is still in an alpha stage, and does
  crash from time to time.  Hence, for important work you should
  consider saving often (other possibilities are logging your sessions,
  or have your R commands stored in text files which can be read in
  using source()).

  Apart from lexical scoping and its implications, R follows the S
  language definition in the Blue Book as much as possible, and hence
  really is an ``implementation'' of S.	There are some intentional
  differences where the behavior of S is considered ``not clean''.  In
  general, the rationale is that R should help you detect programming
  errors, while at the same time being as compatible as possible with S.

  Some known differences are the following.

  o  In R, if x is a list, then x[sub] <- NULL and x[[sub]] <- NULL
     remove the specified elements from x.  The first of these is
     incompatible with S, where it is a no-op.

       I seem to remember that on r-testers we also talked about
       having the second only set the component to NULL rather than
       remove it.  How can this be done now?

  o  In S, the functions named .First and .Last in the `.Data' directory
     can be used for customizing, as they are executed at the very
     beginning and end of a session, respectively.  R looks for files
     called `.Rprofile' in the user's home directory and the current
     directory, and sources these.  (It also loads a saved image from
     `.RData' in case there is one.)  If a .First function exists then,
     it is executed.  The .Last mechanism is not supported yet.

  o  R does not try as hard as S to preserve dimnames attributes
     (examples are apply, rbind, and cbind, but also arithmetic ops).

  o  The S-PLUS version of nchar will return a matrix when given a
     matrix, whereas the R version returns a vector.

  o  R presently does not support IEEE Inf and NaN.

  o  In R, attach currently only works for lists and data frames (not
     for directories).

  o  Categories do not exist in R, and never will as they are deprecated
     now in S.	Use factors instead.

     More to come here.

	I have not gone through the mailing list archive at full
       length.	Please let me know what is missing.

  There are also differences which are not intentional, and result from
  missing or incorrect code in R.  The developers would appreciate
  hearing about any deficiencies you may find (in a written report fully
  documenting the difference as you see it).  Of course, it would be
  useful if you were to implement the change yourself and make sure it
  works.

  4.  R Add-On Packages

  4.1.	Which Add-on Packages Exist for R?

  The R distribution comes with the following extra libraries:

     eda
	Exploratory Data Analysis.  Currently only contains functions
	for robust line fitting, and median polish and smoothing.

     mva
	Multivariate Analysis.	Currently contains code for principal
	components (prcomp), canonical correlations (cancor),
	hierarchichal clustering (hclust), and metric multidimensional
	scaling (cmdscale).  More functions for clustering and scaling,
	biplots, profile and star plots, and code for ``real''
	discriminant analysis will be added soon.

  The following S packages were ported to R by Thomas Lumley
  <thomas@biostat.washington.edu>:

     avas
	ace() and avas() for selecting regression transformations.

     date
	Functions for dealing with dates.

     gee
	An implementation of the Liang/Zeger Generalised Estimating
	Equation approach to GLMs with dependent data.

     splines
	Regression spline functions.

     survival4
	Functions for survival analysis.  This ``port'' is not complete
	yet, because it requires a few changes to the R system itself
	rather than the add-on package, but most of the functionality is
	already available.

  These packages can be obtained via http://www.biostat.washing-
  ton.edu/~thomas/R.html.
  Fritz Leisch <Friedrich.Leisch@ci.tuwien.ac.at> has ported the
  following S packages to R:

     bootstrap.funs
	Software (bootstrap, cross-validation, jackknife), data and
	errata for the book ``An Introduction to the Bootstrap'' by B.
	Efron and R. Tibshirani.

     formatC
	Numeric to string conversion with C printf flexibility.

     fracdiff
	Maximum likelihood estimation of the parameters of a
	fractionally differenced ARIMA(p,d,q) model.

  He is currently working at an interface to the Stuttgart Neural Net-
  works Simulator (SNNS).  His packages can be obtained via
  ftp://ftp.ci.tuwien.ac.at/pub/export/R.

  Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> has written ctest, a library
  of standard ``classical tests'' (such as the Wilcoxon, Kolmogorov-
  Smirnov, and Kruskal-Wallis tests).  This package can also be obtained
  via ftp://ftp.ci.tuwien.ac.at/pub/export/R.

  More code has been posted to the r-testers mailing list, and can be
  obtained from the mailing list archive.

  Currently, there is no common archive for all contributed R code.
  This is very likely to be changed within very soon now.

  4.2.	How Can Add-on Packages Be Installed?

  (Unix only.)	Untar the add-on packages in $RHOME/src/library/ and
  type

       $ make libs
       $ cd ../..
       $ ./etc/install-libhelp

  at the shell prompt.

  4.3.	How Can Add-on Packages Be Used?

  To find out which add-ons have already been installed, type

       R> library()

  at the R prompt.  This produces something like

  NAME	DESCRIPTION
  acepack   ace() and avas() for selecting regression transformations
  bootstrap	Functions for the book "An Introduction to the Bootstrap"
  ctest		Classical Tests
  date	functions for handling dates
  eda	Exploratory Data Analysis
  formatC	 Numeric to string conversion with flexibility of C's printf
  fracdiff	Fractionally differenced ARIMA (p,d,q) models
  gee	Generalised Estimating Equation models
  mva	Classical Multivariate Analysis
  splines	 regression spline functions
  survival4	Survival analysis. [needs library(splines)]

  You can ``load'' an add-on library with name name by

       R> library(name)

  You can then find out which functions it provides by typing

       R> help(library = name)

  4.4.	How Can I Contribute to R?

  R is currently still in alpha (or pre-alpha) state, so simply using it
  and communicating problems is certainly of great value.

  One place where functionality is still missing is the modeling
  software as described in ``Statistical Models in S'' (see question
  ``What is S?''.  The functions

	add1 kappa alias labels drop1 proj

  are missing; many of these are interpreted functions so anyone that is
  bored and wants to have a go at implementing them it would be appreci-
  ated.	In addition, only linear and generalized linear models are cur-
  rently available, aov, gam, loess, tree, and the nonlinear modelling
  code are not there yet.

  Many of the packages available at the Statlib S Repository might be
  worth porting to R.

  5.  R and Emacs

  5.1.	Is there Emacs Support for R?

  There is an an Emacs-Lisp interface to S/S-PLUS called S-mode.  Its
  current version is 4.8 and can be obtained at

       http://www.maths.lancs.ac.uk:2080/~maa036/elisp/S-mode/

  Do not use the earlier versions which can be found at the Statlib S
  repository (gnuemacs3 and gnuemacs4), which are outdated.

  It contains code for interacting with an inferior S process from
  within Emacs including an interface to the help system, editing S
  source code, and transcript manipulation, and comes with detailed
  instructions for installation.

  Because the R language syntactically is basically identical to S, the
  edit mode can be used directly.  If you like to use the extension `.R'
  for your files with R code and want Emacs to automagically turn on the
  S edit mode whenever you visit such a file, add

       (autoload 's-mode "S" "Mode for editing R source" t)
       (if (not (assoc "\\.R$" auto-mode-alist))
	   (add-to-list 'auto-mode-alist (cons "\\.R$" 's-mode)))

  to your Emacs startup file, typically `~/.emacs'.

  To run R from within Emacs, you can do

       (autoload 'S "S" "Run an inferior R process" t)
       (setq inferior-S-program "R")

  to your Emacs startup file.  You can then fire up R from withing Emacs
  by typing `M-x S' (hmm ...).	Note however that many interface func-
  tions will not work.

  Tony Rossini <rossini@math.sc.edu> has recently started working on
  producing an upgraded version of the S-mode package which works better
  under XEmacs and, in particular, provides much improved support for R.
  (Tony, drop me a note, I lost the URL!)

  Kurt Hornik has a mode for running R from within Emacs which was
  written from scratch.	Drop me a note if you are interested.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- For info or help, send "info" or "help",
To [un]subscribe, send "[un]subscribe"
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-