Re: [Rd] Rcpp too good to be true?

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Tue, 13 Dec 2011 16:52:48 -0800

On 12/13/2011 03:48 PM, Jeffrey Pollock wrote:
> Hello all,
>
> I've been working on a package to do various things related to the
> Conway-Maxwell-Poisson distribution and wanted to be able to make fast
> random draws from the distribution. My R code was running quite slow so I
> decided to give Rcpp a bash. I had used it before but only for extremely
> basic stuff and always using inline. This time I decided to give making a
> proper package a go.
>
> First of all I should say that this was incredibly easy due to
> Rcpp.package.skeleton() and the countless answers to quesions online and
> documentation!
>
> Secondly, I'm worried that my speedup has been so massive (over 500x !!!)
> that I think I've made a mistake, hence my post here.
>
> Here is all my code, if someone has a minute to point out anything wrong
> (or even if its correct and there is room for improvement, im pretty new to
> this) it would be much appreciated. I've had a simple look at the results
> and they look fine, but seriously, 500x faster?!
>
> function in R;
> library(compiler)
>
> Rrcomp<- cmpfun(
> function(n, lam, nu, max = 100L) {
> ans<- integer(n)
> dist<- dcomp(0:max, lam, nu, max)
> u<- runif(n)
> for (i in 1:n) {
> count<- 0L
> pr<- dist[1L]
> while (pr< u[i]) {
> count<- count + 1L
> pr<- pr + dist[count + 1L]
> }
> ans[i]<- count
> }
> return(ans)
> }
> )

Hi Jeff

Not really what you're asking about, but looks like you're sampling with replacement from the sequence 0:(dist-1) n times with probability dist, so

Rrcomp.1 <-

     function(n, lam, nu, max = 100L)
{

     dist <- dcomp(0:max, lam, nu, max)
     sample(seq_along(dist) - 1L, n, TRUE, prob=dist)
}

and

 > system.time(res <- table(Rrcomp(n, lam, nu))); res

    user system elapsed
   1.493 0.000 1.495

    0 1 2 3 4 5 6 7 8 9 10 11 12 13   355 1656 4070 6976 8745 8861 7275 5214 3357 1892 926 399 165 69    14 15 16 17
   24 11 2 3
 > system.time(res <- table(Rrcomp.1(n, lam, nu))); res

    user system elapsed
   0.029 0.000 0.028

    0 1 2 3 4 5 6 7 8 9 10 11 12 13   333 1754 4096 6876 8964 8799 7399 5030 3215 1877 951 432 184 61    14 15 16
   23 5 1

Martin

> dcomp<- function(y, lam, nu, max = 100L) {
> Z<- function(lam, nu, max) {
> sum<- 0L
> for(i in 0L:max) {
> sum<- sum + lam^i / factorial(i)^nu
> }
> return(sum)
> }
> return(lam^y / (factorial(y)^nu * Z(lam, nu, max)))
> }
>
> function in Rcpp;
> header file;
>
> #include<Rcpp.h>
>
> RcppExport SEXP rcomp(SEXP n_, SEXP dist_);
>
> source file;
>
> #include "rcomp.h"
>
> SEXP rcomp(SEXP n_, SEXP dist_) {
> using namespace Rcpp ;
>
> int n = as<int>(n_);
> NumericVector dist(dist_);
>
> NumericVector ans(n);
> int count;
> double pr;
> RNGScope scope;
> NumericVector u = runif(n);
>
> for (int i = 0; i< n; ++i) {
> count = 0;
> pr = dist[0];
> while (pr< u[i]) {
> count++;
> pr += dist[count];
> }
> ans[i] = count;
> }
> return ans;
> }
>
> R call;
>
> rcomp<- function(n, lam, nu, max = 100){
> dist<- dcomp(0:max, lam, nu, max)
> .Call("rcomp", n = n, dist = dist, PACKAGE = "glmcomp")
> }
>
> Here are some results;
>> n<- 50000
>> lam<- 5
>> nu<- 1
>> rbind(table(rcomp(n, lam, nu))[1:10] / n, table(Rrcomp(n, lam, nu))[1:10]
> / n, dpois(0:9, lam))
> 0 1 2 3 4
> 5 6
> [1,] 0.006440000 0.03124000 0.08452000 0.1392200 0.1747800 0.1755200
> 0.1490000
> [2,] 0.006660000 0.03232000 0.08412000 0.1425400 0.1779600 0.1748400
> 0.1445600
> [3,] 0.006737947 0.03368973 0.08422434 0.1403739 0.1754674 0.1754674
> 0.1462228
> 7 8 9
> [1,] 0.1063000 0.06538000 0.03534000
> [2,] 0.1039800 0.06492000 0.03624000
> [3,] 0.1044449 0.06527804 0.03626558
>
> (for nu = 1 the com-poisson distribution is the same as normal poisson)
>
>> benchmark(rcomp(n, lam, nu), Rrcomp(n, lam, nu), replications = 10, order
> = "relative")
> test replications elapsed relative user.self sys.self
> 2 Rrcomp(n, lam, nu) 10 2.03 1.0000 1.96 0.00
> 1 rcomp(n, lam, nu) 10 1172.51 577.5911 1164.50 0.08
>
> Thanks in advance if anyone has any time to have a look at this :)
>
> Jeff
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 14 Dec 2011 - 00:55:06 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 14 Dec 2011 - 06:40:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive