Re: [R] Antwort: Re: Antwort: Buying more computer for GLM

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 01 Sep 2006 - 09:12:48 GMT

On Fri, 1 Sep 2006, g.russell@eos-finance.com wrote:

> Peter Dalgaard wrote
> > Is this floating point bound? (When you say 30 factors does that mean
> > 30 parameters or factors representing a much larger number of groups).
> > If it is integer bound, I don't think you can do much better than
> > increase CPU speed and - note - memory bandwidth (look for large-cache
> > systems and fast front-side bus). To increase floating point
> > performance, you might consider the option of using optimized BLAS
> > (see the Windows FAQ 8.2 and/or the "R Installation and
> > Administration" manual) like ATLAS; this in turn may be multithreaded
> > and make use of multiple CPUs or multi-core CPUs.
>
> By "factors" I mean "parameters". I apologise for the confusion.
>
> This is floating point bound, so ATLAS might be a good idea.
>
> Before I put a lot of work into investigating multiple processors, I
> need to know, is the bottleneck with GLM going to be BLAS?

Probably not, but you have the ability to profile in R and find out.

Some more comments;

  1. The Fortran code that underlies glm is that of lm.fit that only makes use of level-1 BLAS and so is not going to be helped greatly by an optimized BLAS.
  2. No one has as far as I know succeeded in making a multithreaded Rblas.dll for Windows. And under systems using pthreads, the success with multithreaded BLAS is very mixed, with it resulting in a dramatic slowdown in some problems.
  3. As I recall, you were doing model selection via AIC on 20,000 observations. You might want to think hard about that, since AIC is designed for good prediction. I would do model exploration on a much smaller representative subset, and if I had 20,000 observations and 30 parameters and was interested in prediction, not do subset selection at all.
  4. glm() alllows you to specify starting parameters, which you could find from a subsample. Very likely only 1 or 2 iterations would be needed.
-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Sep 02 04:22:54 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 07 Sep 2006 - 07:51:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.