Re: [R] Ubuntu vs. Windows

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Wed, 23 Apr 2008 07:09:43 -0500

On 4/22/08, Prof Brian Ripley <ripley_at_stats.ox.ac.uk> wrote:
> On Tue, 22 Apr 2008, Peter Dalgaard wrote:
>
> > Doran, Harold wrote:
> >> Dear List:
> >>
> >> I am very much a unix neophyte, but recently had a Ubuntu box installed
> >> in my office. I commonly use Windows XP with 3 GB RAM on my machine and
> >> the Ubuntu machine is exactly the same as my windows box (e.g.,
> >> processor and RAM) as far as I can tell.
> >>
> >> Now, I recently had to run a very large lmer analysis using my windows
> >> machine, but was unable to due to memory limitations, even after
> >> increasing all the memory limits in R (which I think is a 2gig max
> >> according to the FAQ for windows). So, to make this computationally
> >> feasible, I had to sample from my very big data set and then run the
> >> analysis. Even still, it would take something on the order of 45 mins to
> >> 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
> >> and kept giving execution errors until the data set was very small and
> >> then it ran for a long time)
> >>
> >> However, I just ran the same analysis on the Ubuntu machine with the
> >> full, complete data set, which is very big and lmer gave me back
> >> parameter estimates in less than 5 minutes.
> >>
> >> Because I have so little experience with Ubuntu, I am quite pleased and
> >> would like to understand this a bit better. Does this occur because R is
> >> a bit friendlier with unix somehow? Or, is this occuring because unix
> >> somehow has more efficient methods for memory allocation?

> >>
> > Probably partly the latter and not the former (we try to make the most
> > of what the OS offers in either case), but a more important difference
> > is that we can run in 64 bit address space on non-Windows platforms
> > (assuming that you run a 64 bit Ubuntu).
> >
> > Even with 64 bit Windows we do not have the 64 bit toolchain in place to
> > build R except as a 32 bit program. Creating such a toolchain is beyond
> > our reach, and although progress is being made, it is painfully slow
> > (http://sourceforge.net/projects/mingw-w64/). Every now and then, the
> > prospect of using commercial tools comes up, but they are not
> > "plug-compatible" and using them would leave end users without the
> > possibility of building packages with C code, unless they go out and buy
> > the same toolchain.

> There is another possibility. lmer is heavy on matrix algebra, and so
> usually benefits considerably from an optimized BLAS. Under Windows you
> need to download one of those on CRAN (or build your own). I believe that
> under Ubuntu R will make use of one if it is already installed.

Optimized BLAS is a possible explanation but it would depend on the Ubuntu package for the correct version of Atlas having been installed.  I don't think those packages are installed by default. Even if they were installed, optimized BLAS are not always beneficial for lmer. Depending on the structure of the model, optimized BLAS, especially multithreaded BLAS, can actually slow lmer down.

I think the difference is more likely due to swapping. A typical lmer call does considerable memory allocation at the beginning of the computation then keeps a stable memory footprint during the optimization of the deviance with respect to the model parameters. It does access essentially all the big chunks of memory in that footprint during the optimization. If the required memory is a bit larger than the available memory you get a lot of swapping, as I found out yesterday. I started an lmer run on a 64-bit Ubuntu machine forgetting that I had recently removed a defective memory module from that machine. It had only 2 GB of memory and about 8 GB of swap space. It spent a lot of time swapping. I definitely should have done that run on one of our servers that has much more real memory.

Harold: Typing either

cat /proc/meminfo

or

free

in a shell window on your Ubuntu machine will tell you the amount of memory and swap space on the machine. If you start the lmer fit and switch to a terminal window where you run the program "top" you can watch the evolution of the memory usage by the R program. It will probably increase at the beginning of the run then stabilize.

Harold



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 23 Apr 2008 - 12:14:45 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 23 Apr 2008 - 12:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive