[R] choosing an appropriate linear model

From: Levi Waldron <leviwaldron_at_gmail.com>
Date: Thu, 05 Jun 2008 18:25:37 -0400


I am trying to model the observed leaching of wood preservative chemicals from treated wood during an outdoor experiment where leaching is caused by rainfall events. For each rainfall event, the amount of rainfall was recorded as well as the amount of preservative chemical leached. A number of climatic variables were measured, but the most important is the amount of rainfall.

I have tried a simple linear model, with zero intercept because zero rainfall cannot cause any leaching (leachdata dataframe is attached to this email). The diagnostics show clearly non-normally distributed residuals with a simple linear regression, and I am trying to figure out what to do about it (see attached diagnostics.png). This dataset contains measurements from 57 rainfall events on three replicate samples, for a total of 171 measurements.

Part of the problem is that physically, the leaching values can only be positive, so for the smaller rainfall amounts the residuals are all positive. If I allow an intercept then it is significantly positive, possibly since the researcher wouldn't have collected measurements for very small rain events, but in terms of the model it doesn't make sense physically to have a positive intercept, particularly since lab experiments have shown that a certain amount of rain exposure is required to wet the wood before leaching begins.

I can get more normally distributed residuals by log-transforming the response, or using the optimal box-cox transformation of lambda = 0.21, which produces nicer-looking residuals but unsatisfactory prediction which is the main goal of the model (also attached).

Any advice on how to create a better predictive model? I presume it has something to do with glm, especially since I have repeated rainfalls on replicate samples, but any advice on the approach to take would be much appreciated. The code I used to produce the attached plots is included below.

leach.lm <- lm(leachate~rainmm-1,data=leachdata)

png("dianostics.png",height=1200,width=700) par(mfrow=c(3,2))
plot(leachate~rainmm,data=leachdata,main="Data and fitted line") abline(leach.lm)
plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed leaching",ylab="predicted leaching")
abline(a=0,b=1)
plot(leach.lm)
dev.off()

library(MASS)
boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))

boxtran <- function(y,lambda,inverse=F){   if(inverse)
    return((lambda*y+1)^(1/lambda))
  else
    return((y^lambda-1)/lambda)
}

png("boxcox-dianostics.png",height=1200,width=700) par(mfrow=c(3,2))
logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata) plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
x <- leachdata$rainmm
y <- boxtran(predict(logleach.lm),0.21,T) xy <- cbind(x,y)[order(x),]
lines(xy)
plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs. observed leaching amount",xlab="observed leaching",ylab="predicted leaching")
abline(a=0,b=1)
plot(logleach.lm)
dev.off()



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
dianostics.png boxcox-dianostics.png
Received on Fri 06 Jun 2008 - 00:26:19 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 19:30:39 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive