[Rd] Update: Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect?

From: Dr. D. P. Kreil (Boku) <David.Kreil_at_boku.ac.at>
Date: Thu, 16 Jun 2011 17:45:48 +0200


Dear R developers!

Considering I got no response or comments in the general r-help forum so far, perhaps my question is actually better suited for this list? I have added some more hopefully relevant technical details to my original post (edited below).

Any comments gratefully received!

Best regards,
David Kreil.


Dear R experts,

I have a problem that is a related to the question raised in this earlier post     https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html

My situation is different in that I have only 2 predictors
(coordinates x,y) for local regression but a number of global
("parametric") offsets that I need to consider.

Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z:

    z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by
(global, not local) regression.

We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets.

The below code for toy data (spatial trend in one dimension x) works for two or three offset groups.

Unfortunately, the loess call fails for four or more offset groups with the error message
"Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize,  :
  only 1-4 predictors are allowed"
See comment #1

I tried overriding the restriction (see lower section of code example) and got "k>d2MAX in ehg136.  Need to recompile with increased dimensions."

How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f       if(k .gt. 15)   call ehg182(105)

Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here?

Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower.

Any comments would be greatly appreciated!

Best regards,
David Kreil.

###
#
# loess with parametric offsets - toy data demo
#

x<-seq(0,9,.1);
x.N<-length(x);

# Comment #1:
# These are the offsets ("unknown", to be revovered by the regression): o<-c(0.4,-0.8,1.2); # three parametric variables --> total of four, works o<-c(0.4,-0.8,1.2,-0.2); # four par.variables --> total of 5 > 4 --> error: # "... only 1-4 predictors are allowed"

o.N<-length(o);
f<-sapply(seq(o.N),

          function(n){
            ifelse((seq(x.N)<= n   *x.N/(o.N+1) &
                    seq(x.N)> (n-1)*x.N/(o.N+1)),
                    1,0);
          });

f<-f[sample(NROW(f)),];

y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')}); s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);

l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,          span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red');  # input data points(x,yp,pch='o',col='blue');           # fit of that

d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o);     # spatial distortion lines(x,yp0,pch='+');  # estimate of that

op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});

cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");

#
# to test if the underlying implementation has a limit of 4 (it does):
#

# copied from loess.R, and modified as indicated by ###***### simpleLoess <-
  function(y, x, weights, span = 0.75, degree = 2, parametric = FALSE,

	   drop.square = FALSE, normalize = TRUE,
	   statistics = "approximate", surface = "interpolate",
	   cell = 0.2, iterations = 1, trace.hat = "exact")
{

    ## loess_ translated to R.
    D <- NCOL(x)
###***### if(D > 4) stop("only 1-4 predictors are allowed")

    N <- NROW(x)

    if(!N || !D)	stop("invalid 'x'")
    if(!length(y))	stop("invalid 'y'")

    x <- as.matrix(x)
    max.kd <-  max(N, 200)
    robust <- rep(1, N)
    divisor<- rep(1, D)
    if(normalize && D > 1L) {
	trim <- ceiling(0.1 * N)
	divisor <-
	    sqrt(apply(apply(x, 2L, sort)[seq(trim+1, N-trim), , drop = FALSE],
		       2L, var))
	x <- x/rep(divisor, rep(N, D))

    }
    sum.drop.sqr <- sum(drop.square)
    sum.parametric <- sum(parametric)
    nonparametric <- sum(!parametric)
    order.parametric <- order(parametric)     x <- x[, order.parametric]
    order.drop.sqr <- (2 - drop.square)[order.parametric]     if(degree==1 && sum.drop.sqr)

        stop("specified the square of a factor predictor to be dropped when degree = 1")

    if(D == 1 && sum.drop.sqr)

        stop("specified the square of a predictor to be dropped with only one numeric predictor")

    if(sum.parametric == D) stop("specified parametric for all predictors")     if(iterations)
    for(j in 1L:iterations) {

	robust <- weights * robust
	if(j > 1) statistics <- "none"
	else if(surface == "interpolate" && statistics == "approximate")
	    statistics <- if(trace.hat == "exact") "1.approx"
            else "2.approx" # trace.hat == "approximate"
	surf.stat <- paste(surface, statistics, sep="/")
	z <- .C(R_loess_raw, # ../src/loessc.c
		as.double(y),
		as.double(x),
		as.double(weights),
		as.double(robust),
		as.integer(D),
		as.integer(N),
		as.double(span),
		as.integer(degree),
		as.integer(nonparametric),
		as.integer(order.drop.sqr),
		as.integer(sum.drop.sqr),
		as.double(span*cell),
		as.character(surf.stat),
		fitted.values = double(N),
		parameter = integer(7),
		a = integer(max.kd),
		xi = double(max.kd),
		vert = double(2*D),
		vval = double((D+1)*max.kd),
		diagonal = double(N),
		trL = double(1),
		delta1 = double(1),
		delta2 = double(1),
		as.integer(surf.stat == "interpolate/exact"))
	if(j==1) {
	    trace.hat.out <- z$trL
	    one.delta <- z$delta1
	    two.delta <- z$delta2
	}
	fitted.residuals <- y - z$fitted.values
	if(j < iterations)
	    robust <- .Fortran(R_lowesw,
			       as.double(fitted.residuals),
			       as.integer(N),
			       robust = double(N),
			       integer(N))$robust

    }
    if(surface == "interpolate")
    {
	pars <- z$parameter
	names(pars) <- c("d", "n", "vc", "nc", "nv", "liv", "lv")
	enough <- (D + 1) * pars["nv"]
	fit.kd <- list(parameter=pars, a=z$a[1L:pars[4L]], xi=z$xi[1L:pars[4L]],
		       vert=z$vert, vval=z$vval[1L:enough])
    }
    if(iterations > 1) {
	pseudovalues <- .Fortran(R_lowesp,
				 as.integer(N),
				 as.double(y),
				 as.double(z$fitted.values),
				 as.double(weights),
				 as.double(robust),
				 integer(N),
				 pseudovalues = double(N))$pseudovalues
	zz <- .C(R_loess_raw,
		as.double(pseudovalues),
		as.double(x),
		as.double(weights),
		as.double(weights),
		as.integer(D),
		as.integer(N),
		as.double(span),
		as.integer(degree),
		as.integer(nonparametric),
		as.integer(order.drop.sqr),
		as.integer(sum.drop.sqr),
		as.double(span*cell),
		as.character(surf.stat),
		temp = double(N),
		parameter = integer(7),
		a = integer(max.kd),
		xi = double(max.kd),
		vert = double(2*D),
		vval = double((D+1)*max.kd),
		diagonal = double(N),
		trL = double(1),
		delta1 = double(1),
		delta2 = double(1),
		as.integer(0L))
	pseudo.resid <- pseudovalues - zz$temp

    }
    sum.squares <- if(iterations <= 1) sum(weights * fitted.residuals^2)     else sum(weights * pseudo.resid^2)
    enp <- one.delta + 2*trace.hat.out - N     s <- sqrt(sum.squares/one.delta)
    pars <- list(robust=robust, span=span, degree=degree, normalize=normalize,
		 parametric=parametric, drop.square=drop.square,
		 surface=surface, cell=cell, family=
		 if(iterations <= 1) "gaussian" else "symmetric",
		 iterations=iterations)
    fit <- list(n=N, fitted=z$fitted.values, residuals=fitted.residuals,
		enp=enp, s=s, one.delta=one.delta, two.delta=two.delta,
		trace.hat=trace.hat.out, divisor=divisor)
    fit$pars <- pars
    if(surface == "interpolate") fit$kd <- fit.kd     class(fit) <- "loess"
    fit
}
environment(simpleLoess)<-environment(loess);  # to make it find private funcs
environment(loess)<-.GlobalEnv;         # to make it use our simpleLoess!
environment(loess.smooth)<-.GlobalEnv;  # to make it use our simpleLoess!

# now run the demo with four parametric variables --> total 5.> 4 triggers error:
# k>d2MAX in ehg136. Need to recompile with increased dimensions. # but there is no obvious d2MAX to configure that I can find...



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 16 Jun 2011 - 16:16:48 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 16 Jun 2011 - 16:20:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive