Re: [Rd] Rcpp too good to be true?

From: Jeffrey Pollock <jeffpollock9_at_gmail.com>
Date: Wed, 14 Dec 2011 01:07:55 +0000

Hey Martin,

Thanks for the reply, that is a much better way!!!

On another note, I actually meant to post this on the rcpp-devel list (and have forwarded it there, noting that there was a mistake in the original post), however I'm quite glad I posted it here by mistake to see this answer :)

Cheers, Jeff

On Wed, Dec 14, 2011 at 12:52 AM, Martin Morgan <mtmorgan_at_fhcrc.org> wrote:

> On 12/13/2011 03:48 PM, Jeffrey Pollock wrote:
>
>> Hello all,
>>
>> I've been working on a package to do various things related to the
>> Conway-Maxwell-Poisson distribution and wanted to be able to make fast
>> random draws from the distribution. My R code was running quite slow so I
>> decided to give Rcpp a bash. I had used it before but only for extremely
>> basic stuff and always using inline. This time I decided to give making a
>> proper package a go.
>>
>> First of all I should say that this was incredibly easy due to
>> Rcpp.package.skeleton() and the countless answers to quesions online and
>> documentation!
>>
>> Secondly, I'm worried that my speedup has been so massive (over 500x !!!)
>> that I think I've made a mistake, hence my post here.
>>
>> Here is all my code, if someone has a minute to point out anything wrong
>> (or even if its correct and there is room for improvement, im pretty new
>> to
>> this) it would be much appreciated. I've had a simple look at the results
>> and they look fine, but seriously, 500x faster?!
>>
>> function in R;
>> library(compiler)
>>
>> Rrcomp<- cmpfun(
>> function(n, lam, nu, max = 100L) {
>> ans<- integer(n)
>> dist<- dcomp(0:max, lam, nu, max)
>> u<- runif(n)
>> for (i in 1:n) {
>> count<- 0L
>> pr<- dist[1L]
>> while (pr< u[i]) {
>> count<- count + 1L
>> pr<- pr + dist[count + 1L]
>> }
>> ans[i]<- count
>> }
>> return(ans)
>> }
>> )
>>
>
> Hi Jeff
>
> Not really what you're asking about, but looks like you're sampling with
> replacement from the sequence 0:(dist-1) n times with probability dist, so
>
> Rrcomp.1 <-
> function(n, lam, nu, max = 100L)
>
> {
> dist <- dcomp(0:max, lam, nu, max)
> sample(seq_along(dist) - 1L, n, TRUE, prob=dist)
> }
>
> and
>
> > system.time(res <- table(Rrcomp(n, lam, nu))); res
> user system elapsed
> 1.493 0.000 1.495
>
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13
> 355 1656 4070 6976 8745 8861 7275 5214 3357 1892 926 399 165 69
> 14 15 16 17
> 24 11 2 3
> > system.time(res <- table(Rrcomp.1(n, lam, nu))); res
> user system elapsed
> 0.029 0.000 0.028
>
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13
> 333 1754 4096 6876 8964 8799 7399 5030 3215 1877 951 432 184 61
> 14 15 16
> 23 5 1
>
> Martin
>
> dcomp<- function(y, lam, nu, max = 100L) {
>> Z<- function(lam, nu, max) {
>> sum<- 0L
>> for(i in 0L:max) {
>> sum<- sum + lam^i / factorial(i)^nu
>> }
>> return(sum)
>> }
>> return(lam^y / (factorial(y)^nu * Z(lam, nu, max)))
>> }
>>
>> function in Rcpp;
>> header file;
>>
>> #include<Rcpp.h>
>>
>> RcppExport SEXP rcomp(SEXP n_, SEXP dist_);
>>
>> source file;
>>
>> #include "rcomp.h"
>>
>> SEXP rcomp(SEXP n_, SEXP dist_) {
>> using namespace Rcpp ;
>>
>> int n = as<int>(n_);
>> NumericVector dist(dist_);
>>
>> NumericVector ans(n);
>> int count;
>> double pr;
>> RNGScope scope;
>> NumericVector u = runif(n);
>>
>> for (int i = 0; i< n; ++i) {
>> count = 0;
>> pr = dist[0];
>> while (pr< u[i]) {
>> count++;
>> pr += dist[count];
>> }
>> ans[i] = count;
>> }
>> return ans;
>> }
>>
>> R call;
>>
>> rcomp<- function(n, lam, nu, max = 100){
>> dist<- dcomp(0:max, lam, nu, max)
>> .Call("rcomp", n = n, dist = dist, PACKAGE = "glmcomp")
>> }
>>
>> Here are some results;
>>
>>> n<- 50000
>>> lam<- 5
>>> nu<- 1
>>> rbind(table(rcomp(n, lam, nu))[1:10] / n, table(Rrcomp(n, lam, nu))[1:10]
>>>
>> / n, dpois(0:9, lam))
>> 0 1 2 3 4
>> 5 6
>> [1,] 0.006440000 0.03124000 0.08452000 0.1392200 0.1747800 0.1755200
>> 0.1490000
>> [2,] 0.006660000 0.03232000 0.08412000 0.1425400 0.1779600 0.1748400
>> 0.1445600
>> [3,] 0.006737947 0.03368973 0.08422434 0.1403739 0.1754674 0.1754674
>> 0.1462228
>> 7 8 9
>> [1,] 0.1063000 0.06538000 0.03534000
>> [2,] 0.1039800 0.06492000 0.03624000
>> [3,] 0.1044449 0.06527804 0.03626558
>>
>> (for nu = 1 the com-poisson distribution is the same as normal poisson)
>>
>> benchmark(rcomp(n, lam, nu), Rrcomp(n, lam, nu), replications = 10, order
>>>
>> = "relative")
>> test replications elapsed relative user.self sys.self
>> 2 Rrcomp(n, lam, nu) 10 2.03 1.0000 1.96 0.00
>> 1 rcomp(n, lam, nu) 10 1172.51 577.5911 1164.50 0.08
>>
>> Thanks in advance if anyone has any time to have a look at this :)
>>
>> Jeff
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 14 Dec 2011 - 04:53:01 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 14 Dec 2011 - 06:50:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive