R-alpha: memory exhausted

Ross Ihaka (ihaka@stat.auckland.ac.nz)
Wed, 27 Mar 1996 17:11:28 +1200

Date: Wed, 27 Mar 1996 17:11:28 +1200
From: Ross Ihaka <ihaka@stat.auckland.ac.nz>
Message-Id: <199603270511.RAA06099@stat.auckland.ac.nz>
To: R-testers@stat.math.ethz.ch
Subject: R-alpha: memory exhausted
In-Reply-To: <96Mar26.105222est.29461@mailgate.bank-banque-canada.ca>

There are a couple of undocumented options for setting the size of
memory to be used by R.  There are two kinds of things which R uses
internally.  The basic object is a cons-cell which is just like a lisp
cons cell and acts as the glue which holds the system together.  Cons
cells are all (from memory) 4-word structures.  As shipped, R has
(about) 70,000 of these.  You can increase this number by invoking
R with a -n flag

	R -n200000

for example.  You can also increase the amount of memory available
for vectors; things like vectors of reals, logicals, character strings
etc.  This is done with the -v flag.  Thus

	R -n200000 -v20

will start R with 200000 cons cells and 20 Megabytes for vectors.
(Using a total of 200000*16 + 20*1024^2 bytes  (about 24 Megabytes)).
I have not experimented with very large numbers of cons cells but I
suspect that everything will run fine, but that you will see
occasional pauses when the garbage collector runs.

R was really designed will smaller memory footprints in mind and the
scale of things which people are trying is a bit of a shock.  (one of
my colleagues in economics was saying yesterday that he was having
trouble reading in a vector of 100,000 strings -- well every string
needs a cons-cell and with only 70,000 to begin with ...

We are having a bit of a rethink about the memory management strategy.
For larger scale problems we may need to switch to a separate
allocation area for strings and start to use generational techniques
to speed up the garbage collector.  This will be quite hard and may
take a while.  EVERYTHING depends on the memory management system and
you can imagine the pile of rubble which is likely to ensue if we
start tampering with it.

One last point about GC.  The fundamental theorem of memory management
says that your program should not use more memory than the available
RAM.  At that point page faulting kicks in and you lose in a major
way.  I would postulate that all of S's performance problems stem from
the fact that it allows the heap to grow VERY large and then paging
ties up the disk while the cpu sits idle.
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch