From: Uwe Ligges <ligges_at_statistik.tu-dortmund.de>

Date: Fri, 14 Dec 2012 18:07:09 +0100

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 14 Dec 2012 - 17:22:22 GMT

Date: Fri, 14 Dec 2012 18:07:09 +0100

On 14.12.2012 07:55, Paul Johnson wrote:

> On Thu, Dec 13, 2012 at 3:33 AM, Uwe Ligges

*> <ligges_at_statistik.tu-dortmund.de> wrote:
**>> Long message, but as far as I can see, this is not about base R but the
**>> contributed package Amelia: Please discuss possible improvements with its
**>> maintainer.
**>>
**> Thanks for answering, but I'm really surprised by your answer. The
**> package is a constant, but R got much slower between R-2.15.1 and
**> R-2.15.2. The profile of functions called radically changed, svd gets
**> called much more because solve() fails much more often.
*

I have screened your mail and I have seen statements and code re. Amelia. If you are talking about R, please provide reproducible examples without overhead of packages. The CRAN check times of > 4000 packages are typically a good indicator, and they are a bit slower for R-2.15.2 but not that it is generally worrying (since we also run more checks).

> No change in R could account for it? I'm not saying R is wrong, it

*> may be more accurate and better. After chasing the slowdown for a
**> week, I need some comfort. Does the LINPACK -> LAPACK change play a
**> role. The change I'm looking for is something that would substantially
**> tune up mathematical precision so that matrices that did not seem
**> singular before are now, in the eyes of functions like chol() and
**> solve(). Whereas in R-2.15.1, a matrix can be inverted by solve(),
**> for example, now R-2.15.2 rejects the matrix as singular.
*

Then a LAPACK upgrade for bugfix reasons, I believe.

> I will take the problem up with applications, of course. But you see

*> how package writers might think its ridiculous. They argue, "I had a
**> perfectly fine calculation against R-2.15.0 and R-2.15.1, and now with
**> R-2.15.2 it takes three times as long? And you want me to revise my
**> package?"
**>
**> Would you be persuaded there's an R base question if I showed you a
**> particular matrix that can be decomposed or solved in R-2.15.1 but
**> cannot be in R-2.15.2? I should have thought of that before, I suppose
**> :)
*

Yes, a minimal reproducible example would be good, but then it is probably a LAPACK issue and we have to convince the LAPACK people to improve code.

Best,

uwe

*>
**> pj
*

>> Best,

*>> Uwe Ligges
**>>
**>>
**>>
**>> On 12.12.2012 19:14, Paul Johnson wrote:
**>>>
**>>> Speaking of optimization and speeding up R calculations...
**>>>
**>>> I mentioned last week I want to speed up calculation of generalized
**>>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
**>>> souped up generalized inverse algorithm published by
**>>>
**>>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
**>>> matrix, Electronic Journal of Linear Algebra,
**>>> 17(2008), 637-650.
**>>>
**>>> I was so delighted to see the computation time drop on my Debian
**>>> system that I boasted to the WIndows users and gave them a test case.
**>>> They answered back "there's no benefits, plus Windows is faster than
**>>> Linux".
**>>>
**>>> That sent me off on a bit of a goose chase, but I think I'm beginning
**>>> to understand the situation. I believe R-2.15.2 introduced a tighter
**>>> requirement for precision, thus triggering longer-lasting calculations
**>>> in many example scripts. Better algorithms can avoid some of that
**>>> slowdown, as you see in this test case.
**>>>
**>>> Here is the test code you can run to see:
**>>>
**>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
**>>>
**>>> It downloads a data file from that same directory and then runs some
**>>> multiple imputations with the Amelia package.
**>>>
**>>> Here's the output from my computer
**>>>
**>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
**>>>
**>>> That includes the profile of the calculations that depend on the
**>>> ordinary generalized inverse algorithm based on svd and the new one.
**>>>
**>>> See? The KP algorithm is faster. And just as accurate as
**>>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
**>>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
**>>>
**>>> So I asked WIndows users for more detailed feedback, including
**>>> sessionInfo(), and I noticed that my proposed algorithm is not faster
**>>> on Windows--WITH OLD R!
**>>>
**>>> Here's the script output with R-2.15.0, shows no speedup from the
**>>> KPginv algorithm
**>>>
**>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
**>>>
**>>> On the same machine, I updated to R-2.15.2, and we see the same
**>>> speedup from the KPginv algorithm
**>>>
**>>>
**>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
**>>>
**>>> After that, I realized it is an R version change, not an OS
**>>> difference, I was a bit relieved.
**>>>
**>>> What causes the difference in this case? In the Amelia code, they try
**>>> to avoid doing the generalized inverse by using the ordinary solve(),
**>>> and if that fails, then they do the generalized inverse. In R 2.15.0,
**>>> the near singularity of the matrix is ignored, but not in R 2.15.2.
**>>> The ordinary solve is failing almost all the time, thus triggering the
**>>> use of the svd based generalized inverse. Which is slower.
**>>>
**>>> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
**>>> after translating from Matlab to R. It is not so universally
**>>> applicable as svd based methods, it will fail if there are linearly
**>>> dependent columns. However, it does tolerate columns of all zeros,
**>>> which seems to be the problem case in the particular application I am
**>>> testing.
**>>>
**>>> I tried very hard to get the newer algorithm described here to go as
**>>> fast, but it is way way slower, at least in the implementations I
**>>> tried:
**>>> ## KPP
**>>> ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias. "An
**>>> improved method for
**>>> ## the computation of the Moore Penrose inverse matrix," Applied
**>>> ## Mathematics and Computation, 2011
**>>>
**>>> The notes on that are in the qrginv.R file linked above.
**>>>
**>>> The fact that I can't make that newer KPP algorithm go faster,
**>>> although the authors show it can go faster in Matlab, leads me to a
**>>> bunch of other questions and possibly the need to implement all of
**>>> this in C with LAPACK or EIGEN or something like that, but at this
**>>> point, I've got to return to my normal job. If somebody is good at
**>>> R's .Call interface and can make a pure C implementation of KPP.
**>>>
**>>> I think the key thing is that with R-2.15.2, there is an svd-related
**>>> bottleneck in the multiple imputation algorithms in Amelia. The
**>>> replacement version of the function Amelia:::mpinv does reclaim a 30%
**>>> time saving, while generating imputations that are identical, so far
**>>> as i can tell.
**>>>
**>>> pj
**>>>
**>>>
**>>>
**>>
**>
**>
**>
*

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 14 Dec 2012 - 17:22:22 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

*
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.2.0, at Fri 14 Dec 2012 - 17:32:39 GMT.
*

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*