I have 1 remark and 1 question on the inclusion of interactions in the gam function from the gam package.
I need to fit quantitative predictors in interactions with factors. You can see an example of what I need in fig 9.13 p265 from Hastie and Tibshirani book (1990).
It's clearly stated that in ?gam "Interactions with nonparametric smooth terms are not fully supported".
I have found a trick in a former http://www.math.yorku.ca/Who/Faculty/Monette/S-news/2284.html, using NAs and na.gam.replace argument, but some points are still unclear for me.
First the prediction of new data (using predict function) is not so easy (see script below), and need a close reading from section 7.3.2 of the Chambers and Hastie (1992).
Second I need to have the same intercept for all levels of factor and this not achievable with this trick. My question is : why not replacing NA by 0 (or any other particular value) ?
Here is a quite long (sorry for that) script with a generated dataset to better undestand my question.
in this script the model to fit is (in a GLM-like writing) : y~s(x2):x1
the generated dataset follows this model and y(x2=0)=10 whatever x1.
########################
#data construction (with deliberately very small noise)
data1$x1=factor(c(rep(1,11),rep(2,11),rep(3,5)))
data1$x2=c(rep(0:10,2),0:4)
library(lattice)
#creation of dummy variables for interactions
#model fitting
#prediction fit well data :
#trying to see prediction:
#trying to replace NA in data1 by mean, to mimic na.gam.replace:
predict(model,Ndata)-predict(model) #as you can see there is a systematic biais
#correct way to predict (=returned 0 for terms with NA value):
#alternative solution, 0 instead of NA
model1=gam(y~s(v1)+s(v2)+s(v3),data=data1)
summary(model1)
#what's happened in x2=0 ?
########################
thanks in advance
Dear R-users,
#start of script
########################
data1=data.frame(x1=rep(NA,27),x2=NA,y=NA)
data1[data1$x1==1,"y"]=data1[data1$x1==1,"x2"]^4*5+rnorm(11)+10000
data1[data1$x1==2,"y"]=data1[data1$x1==2,"x2"]^4*(-3)+rnorm(11)+10000
data1[data1$x1==3,"y"]=10000*data1[data1$x1==3,"x2"]+rnorm(5)+10000
xyplot(data1$y~data1$x2,groups=data1$x1)
data1$x2_1=ifelse(data1$x1=="1",data1$x2,NA)
data1$x2_2=ifelse(data1$x1=="2",data1$x2,NA)
data1$x2_3=ifelse(data1$x1=="3",data1$x2,NA)
library(gam)
model=gam(y~s(x2_1)+s(x2_2)+s(x2_3)+x1,data=data1,na=na.gam.replace)
summary(model)
plot(data1$x2,data1$y)
points(data1$x2,model$fitted.value,col="red",pch="+")
predict(model) #does work
predict(model,newdata=data1) #produce NA
Ndata=data1
Ndata$x2_1=ifelse(data1$x1=="1",data1$x2,mean(data1$x2_1,na.rm=TRUE))
Ndata$x2_2=ifelse(data1$x1=="2",data1$x2,mean(data1$x2_2,na.rm=TRUE))
Ndata$x2_3=ifelse(data1$x1=="3",data1$x2,mean(data1$x2_3,na.rm=TRUE))
p=predict(model,data1,type="term")
rowSums(cbind(p,attr(p,"constant")),na.rm=TRUE)-predict(model)
data1$v1=ifelse(data1$x1=="1",data1$x2,0)
data1$v2=ifelse(data1$x1=="2",data1$x2,0)
data1$v3=ifelse(data1$x1=="3",data1$x2,0)
points(data1$x2,predict(model1,data1),col="green",pch="X")
#no particular problem with predict function
predict(model)[data1$x2==0]
predict(model1)[data1$x2==0]
#end of script
########################
best regards
Laurent Beaulaton
Laurent Beaulaton
###############################
# NEW !!!! #
# http://www.laurent-beaulaton.fr/ #
# Tel + 33 (0)5 57 89 27 17 #
###############################
Tel + 33 (0)5 57 89 27 17
Fax + 33 (0)5 57 89 08 01
mailto:laurent.beaulaton@bordeaux.cemagref.fr
http://www.laurent-beaulaton.fr/
http://www.bordeaux.cemagref.fr/rabx/
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.1.8, at Thu 01 Mar 2007 - 13:30:29 GMT.
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.