Re: R-alpha: memory exhausted

Ross Ihaka (ihaka@stat.auckland.ac.nz)
Thu, 28 Mar 1996 11:54:07 +1200


Date: Thu, 28 Mar 1996 11:54:07 +1200
From: Ross Ihaka <ihaka@stat.auckland.ac.nz>
Message-Id: <199603272354.LAA08691@stat.auckland.ac.nz>
To: R-testers@stat.math.ethz.ch
Subject: Re: R-alpha: memory exhausted
In-Reply-To: <96Mar27.151131est.29442@mailgate.bank-banque-canada.ca>

Paul Gilbert writes:
 > 
 > I can't get
 > 	R -n200000 -v20
 > to do anything but give "invalid ... ignored"
 > I print some info: warning: invalid vector heap (-n) size (-536871876)ignored

I don't know whats happening here.  The command R -n200000 -v20 seemed
to work on our Suns.

 > Either I've messed up something while fooling around, or your getting
 > the values from someplace other than the command line argument.
 > 
 > I also don't understand the logic of an error if value < R_NSize in
 > Unixsystem.c:
 > 		if(value < R_NSize || value > 1000000)
 > 			REprintf("warning: invalid vector heap size ignored\n");

This is just making sure that user's don't try to set R_Nsize below a
minimum threshold of 72000.

 > In the end I hard coded 
 >    R_NSize = 200000;
 >    R_VSize = 20 * 1000000;
 > into UNIXsystem.c and have been able to load most of my functions, and
 > apparently save them when I q(), but so far I haven't been able to
 > reload the image when I restart.

Could you put a copy of what you are loading into the incoming
directory of our ftp machine?  You are pushing things far beyond what
we have experience of and it may be possible that you are finding one
or more bugs we haven't seen before.

 > Also, there seems to be a problem with UseMethod as illustrated by the
 > following:
 > > zot <- function(x) UseMethod("zot")
 > > zot.zzz <- function(x) x*2
 > > z <- 2
 > > class(z) <- "zzz"
 > > zot(z)
 > Error in UseMethod("zot") : too few arguments to UseMethod
 > > 
 > 
 > I use classes and methods extensively.

We aren't completely compatible with S at that level.  You have to be
a little more precise in your generic function definition -- arguments
have to be passed explicitly.  E.g.
	
	zot <- function(...) UseMethod("zot", ...)

S does without because it reaches back into a calling frame and grabs
the agument list of the parent call.

If you want to manipulate the arguments explicitly you can do
something like

	zot <- function(x, ...) {
		# play with x here
		UseMethod("zot", x, ...)
	}
		
I believe that if you are explicit about argument passing, things are
compatible with S.  Our really major incompatibility is that we don't have
NextMethod, and it would require quite a bit of a rearrangement to
implement.

This isn't just being different for the sake of it ...  Our underlying
evaluation model is quite different from S, and we would have to give
up some performance to achieve compatibility..		

 > >One last point about GC.  The fundamental theorem of memory management
 > >says that your program should not use more memory than the available
 > >RAM.  At that point page faulting kicks in and you lose in a major
 > >way.  I would postulate that all of S's performance problems stem from
 > >the fact that it allows the heap to grow VERY large and then paging
 > >ties up the disk while the cpu sits idle.
 > 
 > On some systems it would be very restrictive to not use more memory
 > than available RAM (what's swap for anyway?). I know little about this,
 > but, it seems to me the trick is that you don't want the code jumping
 > all over the place in its memory space, so swapping can be fairly
 > efficient. S does grow very large in loops (I believe because it
 > doesn't do any garbage collection until the loop finishes) and gets so
 > large that swapping does kill you. However, if I apply the same rule to
 > S as you're suggesting for R (ie. don't run any big programs) then it is
 > pretty fast.

[ I might say that mentioning memory management in S has an effect on
me which is rather like waving a red rag in front of a bull ... ]

[ Climbs back onto favourite hobby-horse :-)].

Our approach in R is to grab a set amount of memory at startup and to
manage its use tightly.  We achieve locality in R by compacting
everything into contiguous memory at gc time (standard elementary cs
textbook stuff), but the traversal of memory required to achieve this
jumps about all over the place as the active areas are located.

There are small programs (e.g. sorting a few thousand numbers by
shellsort) where a freshly started S starts paging heavily after after
just a few seconds.  When you look look at the cpu with "vmstat" it's
clear that it is virually idle and that the disk is thrashing.  On the
same programs R has the cpu fully occupied and the disk idle.  R
completes the task in 1/10 of the (elapsed) time.

I'm not sayin ``never use virtual memory'', just that it's not a
panacea and that you need to consider the cost.  Memory access is
measured in nanoseconds and disk access in milliseconds.  Thats a huge
difference.  Since memory is virtually free these days and it pays to
make sure that you have enough for your problem.  On the other hand
this shouldn't be an excuse for software to squander memory
unnecessarily.

When we put down a basic design we didn't anticipate that R would be
used for much more than teaching on machines without virtual memory
and 4-6Mb of RAM.  The past few weeks have had us seriously looking at
early design choices.  Some of the shortfalls can be fixed easily, but
others will require some bottom-up redesign :-(.  Having seen what
occurred when we meddled with "eval" you can image what would happen
if we touched the memory subsystem without some careful thought.
	Ross

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-