[R] C source code question (Robustbase edition)

From: kv <kaveh.vakili_at_ulb.ac.be>
Date: Thu, 21 Apr 2011 06:23:37 -0700 (PDT)


Hi all,

I have been trying to add the line:

 h = n - p0 + 1;

just after

 h = n / 2 + 1;

(line 131) in the source code (the original paper mention this is possible for p0&lt;n).

with p0 declared as an int (actually i used the same declaration protocol as n everywhere in the code).

The &quot;new&quot; source compiles, but when i give it reasonable values of p0, it runs unto an infinite loop.

(where i leave the p0 variable in the source, but comment out my modified line everything works perfectly, again).

I rewrote the qn code in R, by &quot;translating&quot; the source code. qn.R always gives the expected result (i.e. does not run unto an infinite loop).

I'm a tiny bit puzzled. Below i attach the modified qn.c and qn.R sources:

Thanks in advance for any hint

/*

/* This is a merge of the C version of original files qn.f and sn.f,

/* Original comments by the authors of the Fortran original code,

   This file contains fortran functions for two new robust estimators    of scale denoted as Qn and Sn, decribed in Rousseeuw and Croux (1993).    These estimators have a high breakdown point and a bounded influence    function. The implementation given here is very fast (running in    O(n logn) time) and needs little storage space.

	Rousseeuw, P.J. and Croux, C. (1993)
	Alternatives to the Median Absolute Deviation&quot;,
	Journal of the American Statistical Association, Vol. 88, 1273-1283.

   For both estimators, implementations in the pascal language can be    obtained from the original authors.

   This software may be used and copied freely for scientific    and/or non-commercial purposes, provided reference is made    to the abovementioned paper.

Note by MM: We have explicit permission from P.Rousseeuw to licence it under the GNU Public Licence.

See also ../inst/Copyrights
*/

#include &lt;inttypes.h&gt;

/* ^^^^^^^^^^ is supposedly more common and standard than
* #include <stdint.h>

#include <R.h>
#include <Rmath.h> /* -> <math.h> and much more */

/* Interface routines to be called via .C() : */
#include "robustbase.h"

/* ----------------- Further Declarations ------------------------------ */

/* sn0() and qn0() --- but also mc_C() in ./mc.c

   pull(a,n,k): finds the k-th order statistic of an

                array a[] of length n (preserving a[]) */

/*

   whimed_i(a,iw,n): finds the weighted high median of an array

		a[] of length n, with positive int weights iw[]
		(using auxiliary arrays acand[], a_srt[] & iw_cand[]
		 all of length n).

*/

/* qn0() uses (and for C API:) */

/* Main routines for C API */

double qn(double *x, int n, int p0, int finite_corrn);
/* these have no extra factors (no consistency factor & finite_corr): */
double qn0(double *x, int n, int p0);

/* ----------- Implementations -----------------------------------*/

void Qn0(double *x, Sint *n, Sint *p0, double *res) { *res = qn0(x, (int)*n, (int)*p0); }

double qn0(double *x, int n, int p0)
{
/*--------------------------------------------------------------------

   Efficient algorithm for the scale estimator:

       Q*_n = { |x_i - x_j|; i&lt;j }_(k) [= Qn without scaling ]

                i.e. the k-th order statistic of the |x_i - x_j|

   Parameters of the function Qn :

       x  : double array containing the observations
       n  : number of observations (n &gt;=2)
 */
    double *y	  = (double *)R_alloc(n, sizeof(double));
    double *work  = (double *)R_alloc(n, sizeof(double));
    double *a_srt = (double *)R_alloc(n, sizeof(double));
    double *a_cand = (double *)R_alloc(n, sizeof(double));

    int *left	  = (int *)R_alloc(n, sizeof(int));
    int *right	  = (int *)R_alloc(n, sizeof(int));
    int *p	  = (int *)R_alloc(n, sizeof(int));
    int *q	  = (int *)R_alloc(n, sizeof(int));
    int *weight	  = (int *)R_alloc(n, sizeof(int));

    double trial = R_NaReal;/* -Wall */
    Rboolean found;

    int h, i, j,jj,jh;
    /* Following should be `long long int' : they can be of order n^2 */     int64_t k, knew, nl,nr, sump,sumq;

    h = n / 2 + 1;
    //h = n - p0 + 1;
    k = (int64_t)h * (h - 1) / 2;
    for (i = 0; i < n; ++i) {

	y[i] = x[i];
	left [i] = n - i + 1;
	right[i] = (i <= h) ? n : n - (i - h);
	/* the n - (i-h) is from the paper; original code had `n' */
    }
    R_qsort(y, 1, n); /* y := sort(x) */     nl = (int64_t)n * (n + 1) / 2;
    nr = (int64_t)n * n;
    knew = k + nl;/* = k + (n+1 \over 2) */     found = FALSE;
#ifdef DEBUG_qn

    REprintf("qn0(): h,k= %2d,%2d; nl,nr= %d,%d\n", h,k, nl,nr); #endif
/* L200: */

    while(!found && nr - nl > n) {

	j = 0;
	/* Truncation to float :
	   try to make sure that the same values are got later (guard bits !) */
	for (i = 1; i < n; ++i) {
	    if (left[i] <= right[i]) {
		weight[j] = right[i] - left[i] + 1;
		jh = left[i] + weight[j] / 2;
		work[j] = (float)(y[i] - y[n - jh]);
		++j;
	    }
	}
	trial = whimed_i(work, weight, j, a_cand, a_srt, /*iw_cand*/ p);

#ifdef DEBUG_qn
	REprintf(" ..!found: whimed(");
#  ifdef DEBUG_long
	REprintf("wrk=c(");
	for(i=0; i < j; i++) REprintf("%s%g", (i>0)? ", " : "", work[i]);
	REprintf("),\n	   wgt=c(");
	for(i=0; i < j; i++) REprintf("%s%d", (i>0)? ", " : "", weight[i]);
	REprintf("), j= %3d) -> trial= %7g\n", j, trial);
#  else
	REprintf("j=%3d) -> trial= %g:", j, trial);
# endif
#endif
	j = 0;
	for (i = n - 1; i >= 0; --i) {
	    while (j < n && ((float)(y[i] - y[n - j - 1])) < trial)
		++j;
	    p[i] = j;
	}
#ifdef DEBUG_qn
	REprintf(" f_1: j=%2d", j);
#endif
	j = n + 1;
	for (i = 0; i < n; ++i) {
	    while ((float)(y[i] - y[n - j + 1]) > trial)
		--j;
	    q[i] = j;
	}
	sump = 0;
	sumq = 0;
	for (i = 0; i < n; ++i) {
	    sump += p[i];
	    sumq += q[i] - 1;
	}
#ifdef DEBUG_qn
	REprintf(" f_2 -> j=%2d, sump|q= %lld,%lld", j, sump,sumq);
#endif
	if (knew <= sump) {
	    for (i = 0; i < n; ++i)
		right[i] = p[i];
	    nr = sump;
#ifdef DEBUG_qn
	    REprintf("knew <= sump =: nr , new right[]\n");
#endif
	} else if (knew > sumq) {
	    for (i = 0; i < n; ++i)
		left[i] = q[i];
	    nl = sumq;
#ifdef DEBUG_qn
	    REprintf("knew > sumq =: nl , new left[]\n");
#endif
	} else { /* sump < knew <= sumq */
	    found = TRUE;
#ifdef DEBUG_qn
	    REprintf("sump < knew <= sumq ---> FOUND\n");
#endif
	}

    } /* while */

    if (found)

        return trial;
    else {
#ifdef DEBUG_qn

        REprintf(".. not fnd -> new work[]");
#endif

	j = 0;
	for (i = 1; i < n; ++i) {
	    for (jj = left[i]; jj <= right[i]; ++jj) {
		work[j] = y[i] - y[n - jj];
		j++;
	    }/* j will be = sum_{i=2}^n (right[i] - left[i] + 1)_{+}  */
	}
#ifdef DEBUG_qn
	REprintf(" of length %d; knew-nl=%d\n", j, knew-nl);
#endif
	/* return pull(work, j - 1, knew - nl)	: */
	knew -= (nl + 1); /* -1: 0-indexing */
	rPsort(work, j, knew);
	return(work[knew]);

    }
} /* qn0 */

double qn(double *x, int n, int p0, int finite_corr) {
/* Efficient algorithm for the scale estimator:

        Qn = dn * 2.2219 * {|x_i-x_j|; i&lt;j}_(k) */

    double r, dn = 1./*-Wall*/;

    r = 2.2219 * qn0(x, n, p0); /* asymptotic consistency for sigma^2 */     /* === */

    if (finite_corr) {

	if (n &lt;= 9) {
	    if	    (n == 2)	dn = .399;
	    else if (n == 3)	dn = .994;
	    else if (n == 4)	dn = .512;
	    else if (n == 5)	dn = .844;
	    else if (n == 6)	dn = .611;
	    else if (n == 7)	dn = .857;
	    else if (n == 8)	dn = .669;
	    else if (n == 9)	dn = .872;
	} else {
	    if (n % 2 == 1)
		dn = n / (n + 1.4);
	    else /* (n % 2 == 0) */
		dn = n / (n + 3.8);
	}
	return dn * r;

    }
    else return r;
} /* qn */

/* pull(): auxiliary routine for Qn and Sn

    k--; /* 0-indexing */
    rPsort(a, n, k);
    ax = a[k];

    vmaxset(vmax);
    return ax;
} /* pull */

/* Local variables section

double whimed_i(double *a, int *w, int n,

                double* a_cand, double *a_srt, int* w_cand) {

/*

  Algorithm to compute the weighted high median in O(n) time.

  The whimed is defined as the smallest a[j] such that the sum   of the weights of all a[i] &lt;= a[j] is strictly greater than   half of the total weight.

  Arguments:

  1. double array containing the observations
  2. number of observations
  3. array of (int/double) weights of the observations. */

    int n2, i, kcand;
    /* sum of weights: `int' do overflow when n ~&gt;= 1e5 */     double wleft, wmid, wright, w_tot, wrest;

    double trial;

    w_tot = 0;
    for (i = 0; i < n; ++i)

        w_tot += w[i];
    wrest = 0;

/* REPEAT : */

    do {

	for (i = 0; i < n; ++i)
	    a_srt[i] = a[i];
	n2 = n/2;/* =^= n/2 +1 with 0-indexing */
	rPsort(a_srt, n, n2);
	trial = a_srt[n2];

	wleft = 0;    wmid  = 0;    wright= 0;
	for (i = 0; i < n; ++i) {
	    if (a[i] < trial)
		wleft += w[i];
	    else if (a[i] > trial)
		wright += w[i];
	    else
		wmid += w[i];
	}
	/* wleft = sum_{i; a[i]	 < trial}  w[i]
	 * wmid	 = sum_{i; a[i] == trial}  w[i] at least one 'i' since trial is one
a[]!
	 * wright= sum_{i; a[i]	 > trial}  w[i]
	 */
	kcand = 0;
	if (2 * (wrest + wleft) > w_tot) {
	    for (i = 0; i < n; ++i) {
		if (a[i] < trial) {
		    a_cand[kcand] = a[i];
		    w_cand[kcand] = w[i];	++kcand;
		}
	    }
	}
	else if (2 * (wrest + wleft + wmid) <= w_tot) {
	    for (i = 0; i < n; ++i) {
		if (a[i] > trial) {
		    a_cand[kcand] = a[i];
		    w_cand[kcand] = w[i];	++kcand;
		}
	    }
	    wrest += wleft + wmid;
	}
	else {
	    return trial;
	    /*==========*/
	}
	n = kcand;
	for (i = 0; i < n; ++i) {
	    a[i] = a_cand[i];
	    w[i] = w_cand[i];
	}

    } while(1);

} /* _WHIMED_ */

#############################################################
mqn<-function(x,p0){
	n<-length(x)
	h<-as.integer((n/2)+1)
	h<-as.integer(n-p0)
	k<-h*(h-1)/2
	y<-left<-right<-weight<-work<-p<-q<-rep(NA,n)
	pulla<-function(a_in,n,k){
		a<-rep(NA,n)
		for(j in 0:(n-1))	a[j+1]<-a_in[j+1]
		ax<-sort(a)[k]
		return(ax)
	}
	y<-sort(x)
	for(i in 1:n){
		left[i]<-n-i+2	
		right[i]<-ifelse(i<=h,n,n-(i-h))
	}
	nl<-n*(n+1)/2
	nr<-n*n
	knew<-as.integer(k+nl)
	found<-F
	while(found==F & (nr-nl>n)){
		j<-1
		for(i in 2:n){
			if(left[i]<=right[i]){
				weight[j]=as.integer(right[i]-left[i]+1)
				jh<-as.integer(left[i]+weight[j]/2);
				work[j]=y[i]-y[as.integer(n-jh+1)]
				j<-j+1
			}
		}
		trial<-weightedMedian(x=work[1:(j-1)],w=weight[1:(j-1)])
		j<-0
		for(i in n:1){
			while((j<(n-1)) & ((y[i]-y[n-j])&lt;trial))	j&lt;-j+1
			p[i]&lt;-j
		}  
		j&lt;-n+1
		for(i in 1:n){
			while((y[i]-y[n-j+2])&gt;trial) j<-j-1
			q[i]<-j
		}
		sump<-sumq<-0
		for(i in 1:n){
			sump<-sump+p[i]
			sumq<-sumq+q[i]-1
		}	
		if(knew<=sump){
			for(i in 1:n) right[i]<-p[i]
			nr<-sump
		} 
		if(knew>sumq){
			for(i in 1:n) left[i]<-q[i]
			nl<-sumq
		}	
		if(knew>sump & knew<=sumq) found<-T
	}
	if(found==T)	ret<-trial
	if(found==F){
		j<-1
		for(i in 2:n){
			if(left[i]<=right[i]){
				for(jj in left[i]:right[i]){
					work[j]=y[i]-y[n-jj+1]
					j<-j+1
				}
			}
		}
		ret<-pulla(a_in=work,n=j-1,k=knew-nl)
	}
	ifelse(n%%2==0,ret*2.2219*n/(n+3.8),ret*2.2219*n/(n+1.4))
}
--
View this message in context: http://r.789695.n4.nabble.com/C-source-code-question-Robustbase-edition-tp3465905p3465905.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 21 Apr 2011 - 13:35:27 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 21 Apr 2011 - 14:30:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive