[R] question regarding arima function and predicted values

From: eugen pircalabelu <eugen_pircalabelu_at_yahoo.com>
Date: Tue, 11 Dec 2007 11:40:31 -0800 (PST)

Good evening!

I have a question regarding forecast package and time series analysis. My syntax:

x<-c(253, 252, 275, 275, 272, 254, 272, 252, 249, 300, 244, 258, 255, 285, 301, 278, 279, 304, 275, 276, 313, 292, 302, 322, 281, 298, 305, 295, 286, 327, 286, 270, 289, 293, 287, 267, 267, 288, 304, 273, 264, 254, 263, 265, 278) library(forecast)
arima(x, order=c(1,1,2), seasonal=list(order=c(0,1,0), period=12))->l auto.arima(x)->k
sd(l$resid)
sd(k$resid)
predict(l,n.ahead=1)
predict(k,n.ahead=1)

  1. I understand that auto.arima will find the best time series model choosing the smaller AIC, BIC and AICc from competing models, but my model finds a smaller AIC than that of the auto.arima. but the sd of the residuals for my model is somehow bigger. Why? Am I missing something? Now the sd of the residuals for my model is somehow bigger, as well as the se for the predicted value. What model would you choose between this two and why?
  2. This question is more theoretical
 m<-sample(c(10:20),10,replace=T)
 f<-sample(c(10:20),10,replace=T)
 t<-m+f
 s<-rbind(m,f,t)

 s

Let's say I have a panel sample at disposal and consider m to be the monthly average quantity of juice consumption for the male part of the sample and f to be the monthly average quantity of juice consumption for the female part of the sample, and t the average quantity of juice consumption for the whole sample. For the mean of the whole sample i have a confidence interval of say +/-2 each month (say I have a sample of 2000 individuals). If I try to come up with a confidence interval only for the male population (which in my sample is say 1000) it would certainly by bigger, because i now have a male sample of 1000 for determining the mean consumption for the whole male population. So my confidence interval is bigger for mean male consumption than for the whole sample (because N declines from 2000 to 1000). Now if I tried to predict the the next month's consumption for both my time series (male and whole sample) the prediction would not "care" that when establishing the  mean consumption i used first 2000 people and then 1000. Am I right? Imagine that each month (from 10 that I sampled above) has such a confidence interval of +/-3. Now how would a future prediction would incorporate this fact: that my mean consumption is not measured via a Census, but using a sample, and that the number is an estimation of the real consumption, within a confidence interval? Is there a good reference text for this incorporation of the confidence interval of past values in determining the future values ?

Thank you and have a great day!        


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 11 Dec 2007 - 19:43:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Dec 2007 - 10:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.