Re: [R] x86 SSE* Pointer Favors

From: Ivan Adzhubey <>
Date: Fri, 13 Jun 2008 02:42:41 -0400

Hi Ivo,

On Friday 13 June 2008 12:23:06 am ivo welch wrote:
> Dear Statisticians--- This is not even an R question, so please
> forgive me. I have so much ignorance in this matter that I do not
> know where to begin. I hope someone can point me to documentation
> and/or a sample.

You will sure find some answers to your questions if you look into R-admin.html file under "Building from source" section. Do a search on BLAS and you will be presented with some options. Using a bit of R web site search on the same keyword will give you even more food for thought.

> I want to compute a covariance as quickly as non-humanly possible on
> an Intel core processor (up to SSE4) under linux. Alas, I have no
> idea how to engage CPU vectorization. Do I need to use special data
> types, or is "double" correct? Does SSE* understand NaN? Should I
> rely on gcc autodetection of the vectorized meaning of my code, or are
> there specific libraries that I should call?

I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster than the stock R BLAS library, depending on your code. Enabling SSE instructions in addition while building R (yes, you have to enable them explicitly, see man gcc) is possible but does not help much since all maths is mostly done in BLAS.

That said, optimized BLAS libraries give most speed increase with older processors. Newer crop of multi-core CPUs with large shared caches is much more difficult to hand-tune code for. You may want to subscribe to Goto BLAS mailing list for an in-depth discussion. ATLAS community is also very helpful (I use their code with our AMD CPUs).

> What I want to learn about is as simple as it gets:
> typedef double Double; // or whatever SSE* needs as close equivalent
> Double vector1[N], vector2[N];
> // then fill them with stuff.

R does not have types, everything that does not look like character string or an integer is treated as double. All arithmetics are always done in double precision.

> vector3= vector_mult(vector1,vector2, N);
> vector4= sum(vector1, N);
> I just need a pointer and/or primer. PS: If someone knows of a
> superfast vectorized implementation of Gentleman's WLS algorithm,
> please point me to it, too. I am still using my old non-vectorized C
> routines.

Ivan mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 Jun 2008 - 06:44:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 08:30:55 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive