Re: [R] Compiling R with multi-threaded BLAS math libraries - why not actually ?

From: Tal Galili <tal.galili_at_gmail.com>
Date: Sat, 12 Jun 2010 17:16:03 +0300

Hello Douglas,

Thank you for the BLAST!=BLAS correction (I imagine my slip was due to some working I have done recently with an RNA analysis software called BLAST).

Also, thank you for the very interesting posting here and in your reply to David's post.

My current conclusion from this thread are that: 1) This should be interesting ONLY if I will be working on large matrices and doing "very specific
kinds of operations". (I imagine David's examples on his post demonstrate those)
2) In case I would like to do it, I will need to go follow the actions detailed here (thank you for the pointer):
http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-use-a-fast-BLAS_003f And more or less pray that my computer specification are relevant. <http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-use-a-fast-BLAS_003f>(Although I do wonder how does REvolution distribution succeeds in doing this without making the user do any more steps then just installing R)

Thanks everyone for the replies so far.

With much respect,
Tal

----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili_at_gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)

On Sat, Jun 12, 2010 at 4:39 PM, Douglas Bates <bates_at_stat.wisc.edu> wrote:

> On Sat, Jun 12, 2010 at 6:18 AM, Tal Galili <tal.galili@gmail.com> wrote:
> > Hello Gabor, Matt, Dirk.
> >
> > Thank you all for clarifying the situation.
> >
> > So if I understand correctly then:
> > 1) Changing the BLAST would require specific BLAST per computer
> > configuration (OS/chipset).
>
> It's BLAS (Basic Linear Algebra Subroutines) not BLAST. Normally I
> wouldn't be picky like this but if you plan to use a search engine you
> won't find anything helpful under BLAST.
>
> > 2) The advantage would be available only when doing _lots_ of linear
> > algebra
>
> You need to be working with large matrices and doing very specific
> kinds of operations before the time savings of multiple threads
> overcomes the communications overhead. In fact, sometimes the
> accelerated BLAS can slow down numerical linear algebra calculations,
> such as sparse matrix operations.
>
> > So I am left wondering for each item:
> > 1) How do you find a "better" (e.g: more suited) BLAST for your system?
> (I
> > am sure there are tutorials for that, but if someone here has
> > a recommendation on one - it would be nice)
>
> As Dirk has pointed out, it is a simple process.
>
> Step 1: Install Ubuntu or some other Debian-based Linux system.
> Step 2: type
> sudo apt-get install r-base-core libatlas3gf-base
>
> > 2) In what situations do we use __lots" of linear algebra? For example,
> I
> > have cases where I performed many linear regressions on a problem, would
> > that be a case the BLAST engine be effecting?
>
> Re-read David's posting. The lm and glm functions do not benefit
> substantially from accelerated BLAS because the underlying
> computational methods only use level-1 BLAS. (David said they don't
> use BLAS but that is not quite correct. I posted a follow-up comment
> describing why lm and glm don't benefit from accelerated BLAS.)
>
> > I am trying to understand if REvolution emphasis on this is a
> > marketing gimmick, or are they insisting on something that some R users
> > might wish to take into account. In which case I would, naturally (for
> many
> > reasons), prefer to be able to tweak the native R system instead of
> needing
> > to work with REvolution distribution.
>
> As those who, in Duncan Murdoch's phrase, found the situation
> sufficiently extreme to cause them to read the documentation, would
> know, descriptions of using accelerated BLAS with R have been in the R
> administration manual for years. Admittedly it is not a
> straightforward process but that is because, like so many other
> things, it needs to be handled differently on each operating system.
> In fact it is even worse because the procedure can be specific to the
> operating system and the processor architecture and, sometimes, even
> the task. Again, re-read David's posting where he says that you
> probably don't want to combine multiple MKL threads with explicit
> parallel programming in R using doSMP.
>
> David's posting (appropriately) shows very specific examples that
> benefit greatly from accelerated BLAS. Notice that these examples
> incorporate very large matrices. The first two examples involve
> forming chol(crossprod(A)) where A is 10000 by 5000. If you have very
> specific structure in A this calculation might be meaningful. In
> general, it is meaningless because crossprod(A) is almost certainly
> singular. (I am vague on the details but perhaps someone who is
> familiar with the distribution of singular values of matrices can
> explain the theoretical results. There is a whole field of statistics
> research dealing with sparsity in the estimation of covariance
> matrices that attacks exactly this "large n, large p" rank deficiency
> problem.)
>
> > Lastly, following on Matt suggestion, if any has a tutorial on the
> subject,
> > I'd be more then glad to publish it on r-statistics/r-bloggers.
> >
> > Thanks again to everyone for the detailed replies.
> >
> > Best,
> > Tal
> >
> >
> >
> >
> > ----------------Contact
> > Details:-------------------------------------------------------
> > Contact me: Tal.Galili_at_gmail.com | 972-52-7275845
> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> > www.r-statistics.com (English)
> >
> ----------------------------------------------------------------------------------------------
> >
> >
> >
> >
> > On Sat, Jun 12, 2010 at 6:01 AM, Matt Shotwell <shotwelm_at_musc.edu>
> wrote:
> >
> >> In the case of REvolution R, David mentioned using the Intel MKL,
> >> proprietary library which may not be distributed in the way R is
> >> distributed. Maybe REvolution has a license to redistribute the library.
> >> For the others, I suspect Gabor has the right idea, that the R-core team
> >> would rather not keep architecture dependent code in the sources,
> >> although there is a very small amount already (`grep -R __asm__`).
> >>
> >> However, I know using Linux (Debian in particular) it is fairly
> >> straightforward to build R with `enhanced' BLAS libraries. The R
> >> Administration and Installation manual has a pretty good section on
> >> linking with enhanced BLAS and LAPACK libs, including the Intel MKL, if
> >> you are willing cough up $399, or swear not to use the library
> >> commercially or academically.
> >>
> >> Maybe a short tutorial using free software, such as ATLAS would be
> >> suitable content for an r-bloggers post :) ?
> >>
> >> Matt Shotwell
> >> Graduate Student
> >> Div. Biostatistics and Epidemiology
> >> Medical University of South Carolina
> >>
> >> On Fri, 2010-06-11 at 19:21 -0400, Tal Galili wrote:
> >> > Hello all,
> >> > I came across<
> >>
> http://www.r-bloggers.com/performance-benefits-of-linking-r-to-multithreaded-math-libraries/
> >> >
> >> > David
> >> > Smith's new post
> >> > Performance benefits of linking R to multithreaded math
> >> > libraries<
> >>
> http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
> >> >
> >> > Which explains how (and why) REvolution distribution of R uses
> >> > different BLAS math libraries for R, so to
> >> > allow multi-threaded mathematical computation.
> >> > What the post doesn't explain is why it is that native R distribution
> >> > doesn't use the multi-threaded version of the libraries. Is it
> because
> >> > R-devel team didn't get to it yet or is it for some technical reason.
> >> > Could someone please help to explain the situation?
> >> >
> >> > Thanks in advance,
> >> > Tal
> >> >
> >> > p.s: I wasn't sure if to send the question here or to R-devel, I
> decided
> >> to
> >> > send it here. If I am in the wrong - please let me know.
> >> >
> >> >
> >> >
> >> > ----------------Contact
> >> > Details:-------------------------------------------------------
> >> > Contact me: Tal.Galili_at_gmail.com | 972-52-7275845
> >> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il(Hebrew) |
> >> > www.r-statistics.com (English)
> >> >
> >>
> ----------------------------------------------------------------------------------------------
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help_at_r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 12 Jun 2010 - 14:19:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 12 Jun 2010 - 17:40:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive