Re: [Rd] --max-vsize

From: Christophe Rhodes <csr21_at_cantab.net>
Date: Tue, 26 Jul 2011 10:08:53 +0100

Prof Brian Ripley <ripley_at_stats.ox.ac.uk> writes:

> Point 1 is as documented: you have exceeded the maximum integer and it
> does say that it gives NA. So the only 'odd' is reporting that you
> did not read the documentation.

I'm sorry; I thought that my message made it clear that I was aware that the NA came from exceeding the maximum representable integer. To belatedly address the other information I failed to provide, I use R on Linux, both 32-bit and 64-bit (with 64-bit R).

> Point 2 is R not using the correct units for --max-vsize (it used the
> number of Vcells, as was once documented), and I have fixed.

Thank you; I've read the changes and I think they meet my needs. (I will try to explain how/why I want to use larger-than-integer mem.limits() below. If there's a better or more supported way to achieve what I want, that'd be fine too)

> But I do wonder why you are using --max-vsize: the documentation says
> it is very rarely needed, and I suspect that there are better ways to
> do this.

Here's the basic idea: I would like to be able to restrict R to a large amount of memory (say 4GB, for the sake of argument), but in a way such that I can increase that limit temporarily if it turns out to be necessary for some reason.

The desire for a restriction is that I have found it fairly difficult to predict in advance how much memory a given calculation or analysis is going to take. Part of that is my inexperience with R, leading to hilarious thinkos, but I think that part of that difficulty to predict is going to remain even as I gain experience. I use R both on multi-user systems and on single-user-multiple-use systems, and in both cases it is usually bad if my R session causes the machine to swap; usually that swapping is not the result of a desired computation -- most often, it's from a straightforward mistake -- but it can take substantial amounts of time for the machine to respond to aborts or kill requests, and usually if the process grows enough to touch swap it will continue growing beyond the swap limit too.

So, why not simply slap on an address-space ulimit instead (that being the kind of ulimit in Linux that actually works...)? Well, one reason is that it then becomes necessary to estimate at the start of an R session how much memory will be needed over the lifetime of that session; guess too low, and at some point later (maybe days or even weeks later) I might get a failure to allocate. My options at that stage would be to save the workspace and restart the session with a higher limit, or attempt to delete enough things from the existing workspace to allow the allocation to succeed. (Have I missed anything?) Saving and restarting will take substantial time (from writing ~4GB to disk) while deleting things from the existing session involves cognitive overhead that is irrelevant to my current investigation and may in any case not succeed to free enough.

So, being able to raise the limit to something generally large for a short time to perform a computation, get the results, and then lower the limit again allows me to protect myself in general from overwhelming the machine with mistaken computations, while also allowing in specific cases the ability to dedicate more resources to a particular computation.

> I don't find reporting values of several GB as bytes very useful, but
> then mem.limits() is not useful to me either ....

Ah, I'm not particularly interested in the reporting side of mem.limits() :-); the setting side, on the other hand, very much so.

Thank you again for the fixes.

Best,

Christophe



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 26 Jul 2011 - 09:11:28 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 26 Jul 2011 - 18:20:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive