[R] [Fwd: Re: [Fwd: failure delivery]]

From: Prof J C Nash <nashjc_at_uottawa.ca>
Date: Thu 26 May 2005 - 09:30:35 EST

I appear to have hit one of the "drop" issues raised in some discussions a couple of years ago by Frank Harrell. They don't seem to have been fixed, and I'm under some pressure to get a quick solution for a forecasting task I'm doing.

I have been modelling some retail sales data, and the days just after Thanksgiving (US version!) are important. So I created some dummy variables by a factor called "events" and (really ugly!!) have TG, TG+1, TG+2, etc. Now I also have DEC1, and the calendar and data are such that in the period I'm forecasting I have TG+3 but this is NOT in the estimation data. There are also weekday factors (wdf) and some cross factors (Saturday + some special days is highly significant).

The model is Sales ~ daynumber + wdf*events + wdf*specialevents

where daynumber is the day sequence in the year and specialevents is a set of factors to tell when the business has promotional activities. The entire model has about 330 coefficients (it seriously needs some economizing), but only about 140 of these are estimated.

I'm using lm() to do the estimation. I plan to change the model and possibly the method once I've seen if forecasting works. The current model "works" moderately well for in-sample fits, though I suspect there is too much variability generally.

I want to advance 1 week at a time, reestimate, and iterate. This is a test case where we know the "future". I can get this to work for a few weeks starting at 20041101, but then get an error msg

                "new factor levels in 'events' ...".

I have tried putting drop.factor.levels = TRUE in predict(), but this didn't seem to register. Also tried suggestion from web to use

          ifac <- sapply(estndta,is.factor)
          fcstdta[ifac] <- lapply(fcstdta[ifac],factor)

Still get same error.

I've tried a couple of dozen variants on this with no joy.

Finally have tried using the full data set in lm() but set weights for the estimation period to 1, and those for the forecast period to 0. This
"computes", but the results include NAs at a point where there seems no
reason for them.

I'm starting to suspect that there's some sort of bug somewhere in the R internals.

  Any advice welcome.

John C. Nash, School of Management, University of Ottawa,
Vanier Hall 451, 136 Jean-Jacques Lussier Private,
P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada
email: nashjc on mail server uottawa.ca, voice mail: 613 562 5800 X 4796
fax 613 562 5164,  Web URL = http://macnash.admin.uottawa.ca

"Practical Forecasting for Managers" web site is at
http://www.arnoldpublishers.com/support/nash/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Thu May 26 09:37:40 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:06 EST