From: Levi Waldron <leviwaldron_at_gmail.com>

Date: Fri, 06 Jun 2008 12:53:38 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 18:41:00 GMT

Date: Fri, 06 Jun 2008 12:53:38 -0400

Perhaps this was too big a question, so I'll ask something shorter:

I have fit a linear model, and want to use its prediction intervals to calculate the sum of many individual predictions.

- Some of the lower prediction intervals are negative, which is non-sensical. Should I just set all negative predictions to zero, or is there another way to require non-negative predictions only?
- I am interested in the sum of many predictions based on the lm. How can I calculate the 95% prediction interval for the sum? Should I calculate a root mean square of the individual errors, or use a bootstrap method, or something else?

ps. the data is attached to the end of this email.

On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron <leviwaldron_at_gmail.com> wrote:

> I am trying to model the observed leaching of wood preservative chemicals

*> from treated wood during an outdoor experiment where leaching is caused by
**> rainfall events. For each rainfall event, the amount of rainfall was
**> recorded as well as the amount of preservative chemical leached. A number
**> of climatic variables were measured, but the most important is the amount of
**> rainfall.
**>
**> I have tried a simple linear model, with zero intercept because zero
**> rainfall cannot cause any leaching (leachdata dataframe is attached to this
**> email). The diagnostics show clearly non-normally distributed residuals
**> with a simple linear regression, and I am trying to figure out what to do
**> about it (see attached diagnostics.png). This dataset contains measurements
**> from 57 rainfall events on three replicate samples, for a total of 171
**> measurements.
**>
**> Part of the problem is that physically, the leaching values can only be
**> positive, so for the smaller rainfall amounts the residuals are all
**> positive. If I allow an intercept then it is significantly positive,
**> possibly since the researcher wouldn't have collected measurements for very
**> small rain events, but in terms of the model it doesn't make sense
**> physically to have a positive intercept, particularly since lab experiments
**> have shown that a certain amount of rain exposure is required to wet the
**> wood before leaching begins.
**>
**> I can get more normally distributed residuals by log-transforming the
**> response, or using the optimal box-cox transformation of lambda = 0.21,
**> which produces nicer-looking residuals but unsatisfactory prediction which
**> is the main goal of the model (also attached).
**>
**> Any advice on how to create a better predictive model? I presume it has
**> something to do with glm, especially since I have repeated rainfalls on
**> replicate samples, but any advice on the approach to take would be much
**> appreciated. The code I used to produce the attached plots is included
**> below.
**>
**>
**> leach.lm <- lm(leachate~rainmm-1,data=leachdata)
**>
**> png("dianostics.png",height=1200,width=700)
**> par(mfrow=c(3,2))
**> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
**> abline(leach.lm)
**> plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed
**> leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed
**> leaching",ylab="predicted leaching")
**> abline(a=0,b=1)
**> plot(leach.lm)
**> dev.off()
**>
**> library(MASS)
**> boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))
**>
**> boxtran <- function(y,lambda,inverse=F){
**> if(inverse)
**> return((lambda*y+1)^(1/lambda))
**> else
**> return((y^lambda-1)/lambda)
**> }
**>
**> png("boxcox-dianostics.png",height=1200,width=700)
**> par(mfrow=c(3,2))
**> logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
**> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
**> x <- leachdata$rainmm
**> y <- boxtran(predict(logleach.lm),0.21,T)
**> xy <- cbind(x,y)[order(x),]
**> lines(xy)
**> plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs.
**> observed leaching amount",xlab="observed leaching",ylab="predicted
**> leaching")
**> abline(a=0,b=1)
**> plot(logleach.lm)
**> dev.off()
**>
*

`leachdata` <-

structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822,

9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042,2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548,

1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54), leachate = c(0.94, 4.74, 2.84, 3.28, 0.07, 1.56, 0.48, 9.63, 1.2, 2.55, 0.15, 0.67, 0.57, 0.38, 1.81, 0.08, 0.94, 0.79, 0.16, 0.09, 1.2, 0.61, 0.77, 0.02, 1, 0.26, 1.34, 0.81, 0.18, 0.17, 0.005, 0.25, 0.42, 1.45, 0.54, 0.24, 0.41, 0.55, 1.59, 1.09, 3.84, 11.52, 6.21, 3.86, 2.34, 11.02, 2.33, 1.83, 2.4, 0.74, 0.71, 0.55, 0.31, 0.83, 0.29, 0.48, 0.92, 1.33, 4.8, 1.73, 1.87, 0.21, 1.04, 1.08, 6.74, 1.23, 2.5, 0.13, 1.29, 0.75, 0.66, 2.14, 0.17, 0.43, 0.69, 0.47, 0.14, 1.6, 0.56, 1.02, 0.04, 0.75, 0.32, 1.68, 0.58, 0.42, 0.18, 0.1, 0.34, 0.36, 1.54, 0.38, 0.18, 0.26, 0.005, 0.17, 0.18, 0.4, 2.13, 0.87, 0.75, 0.52, 3.21, 0.49, 0.85, 1.24, 0.32, 0.5, 0.37, 0.19, 0.53, 0.3, 0.51, 1.37, 1.25, 3.69, 2.76, 1.82, 0.005, 0.99, 0.87, 6.93, 1.04, 2.26, 0.14, 1.27, 0.62, 0.6, 2.91, 0.19, 0.41, 0.47, 0.38, 0.17, 1.56, 0.41, 0.92, 0.02, 0.51, 0.26, 0.86, 0.47, 0.39, 0.12, 0.08, 0.28, 0.3, 1.16, 0.27, 0.15, 0.22, 0.3, 0.18, 0.16,0.47, 6, 1.47, 0.67, 0.35, 2.13, 0.51, 0.85, 1.37, 0.23, 0.45, 0.34, 0.17, 0.46, 0.23, 0.43, 1.17)), .Names = c("rainmm", "leachate" ), row.names = c(NA, -171L), class = "data.frame")

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 18:41:00 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 19:30:39 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*