From: Steve Jones <steve_at_squaregoldfish.co.uk>

Date: Tue, 13 Jul 2010 13:05:22 +0100

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 13 Jul 2010 - 12:11:29 GMT

Date: Tue, 13 Jul 2010 13:05:22 +0100

I'm using R-2.9.1, so forgive me if this has already been resolved.

One element of the object returned from the acf function is n.used, described in the man page as "The number of observations in the time series".

However, I've noticed that this value is set to nrow(x) via the sampleT variable, i.e. the number of rows in the passed-in series. This forces an assumption that all rows of the series contain values.

Since it's possible to calculate an acf from an incomplete series by passing 'na.action=na.pass', I would suggest that the value of n.used in this instance should be set to 'sum(!is.na(series))'.

This has knock-on effects too: the plot produced by the acf function includes a horizontal line showing the threshold of statistical significance, which is dependent on the number of measurements in the series:

qnorm((1 + 0.95)/2)/sqrt(corr$n.used)

For a given set of time series of fixed length, the threshold is therefore constant regardless of the number of valid measurements in the series, which I believe to be incorrect.

As a side note, I also think that this significance threshold should be returned as part of the output of the acf function - as it stands, the value is shown in a plot but there's no way to actually get the value.

Steve.

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel

- application/pgp-signature attachment: OpenPGP digital signature

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 13 Jul 2010 - 15:50:15 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*